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ABSTRACT 

The journal articles reprinted in this publication 
were selected to aid scholars interested in doing research on the 
teaching of college economics or in learning about the results of 
such research. The preface points out that the readings are not 
intended to provide advice on how to teach economics better. With the 
exception of one, all of the articles have been previously published 
in scholarly journals and subjected to their review procedures. The 
28 articles are organized into four main sections: surveys of the 
literature, critiques of research methodology, foundations of 
research, and research report3. Some examples of the types of 
articles included are: Male-Female Differences in Economic Education: 
A Survey; Student Performance and Changes in Learning Technology in 
Required Courses; The Efficiency of Programmed Learning in Teaching 
Economics; The Lasting Effects of Introductory Economics Courses; 
Teacher Effectiveness and Student Performance; Student-to-Student 
Tutoring in Economics; and Test Information. (RM) 
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Preface 

John J. Siegfried and Rendigs Fels 



We have selected these readings to aid 
scholars interested in doing research on the 
teaching of college economics or in learning 
about the results of such research, rather 
than to provide advice on how to teach eco- 
nomics better. While the ultimate purpose of 
all this research is to improve the teaching 
and learning process, its immediatcpurpose 
is to accumulate more knowledge about 
what happens in the classroom. Most of the 
papers in this book are therefore mainly 
methodological or not directly applicable to 
teaching. A few do contain information rele- 
vant to classroom procedures. However, 
much richer sources of specific ad Wee about 
how to teach economics are Resource 
Manual for Teacher Training Programs in. 
Economics (1978)*, edited by Phillip Saun- 
ders, Arthur L. Welsh, and W. Lee Hansen, 
as well as other materials used in the Teacher 
Training Program of the Joint Council on 
Economic Education. 

In one sense, research on economic educa- 
tion is an old field, in another sense it is quite 
new. When Laurence E. Learner wrote his 
"Brief History of Economics in General 
Education" for the American Economic 
Review (1950), he found a voluminous liter- 
ature already in existence. But previous 
authors had written so casually as to suggest 
that no one had ever considered the subject 
before and that no one would bother to go 
on from where they had left off. Learner 
ended with a plea for economists to "build a 
cumulative literature on their subject" and 
added that "progress in economic education 
is hardly to be made in the next sixty years if 
we are to persist, as we have in the past, in 
retracing previously trod paths" (p. 33). 

A decade went by before Learner's plea 
began to be heeded. Simon N. Whitney 
pioneer ■ d vith his "Measuring the Success of 
the Elementa / Course" (1960), but his was 



an isolated effort, h 1961, the American 
Economic Association (AEA) began to pub- 
lish material from its sessions devoted to 
economic education (see the annual Papers 
and Proceedings issues of the American Eco- 
nomic Review.) In the early years these 
papers mainly consisted of informal obser- 
vations rather than findings based on gen- 
uine research. In the mid-1960s, however, 
several developments helped stimulate the 
beginnings of a cumulative literature: the 
Test of Economic Understanding (TEU) was 
published; a television series on "The 
American Economy" was shown nationwide; 
programmed instruction (PI) appeared; the 
Test of Understanding in College Economics 
(TUCE) was developed. Their impact was 
enhanced by the coming of age of the com- 
puter, by the decision of the Joint Council on 
Economic Education to add a college-and- 
university program to its existing activities 
in lower schools, and by the thrust to re- 
search given by the AEA's Economic Educa- 
tion Committee under the leadership of G. L. 
Bach. 

Although the TEU was designed for high 
schools, it was nevertheless used to conduct 
research on the effectiveness of the TV 
presentations of "The American Economy," 
for which a number of institutions offered 
college credit to participants (Saunders, 
1964; McConnell and Felton, 1964; Bach and 
Saunders, 1965, reading 18 below). The 
paper by Bach and Saunders was the first 
major research report on economic educa- 
tion ever to be published, at least in this 
country, and it is still an interesting docu- 
ment. 

Although programmed instruction is no 
longer so greatly in vogue, its advent in the 
1960s stimulated much enthusiasm, A num- 
ber of books on PI appeared, and research 
was undertaken on its effectiveness. A 



♦All publicatk ?ntioned here are cited in full in the References at the end of this Preface or are reprinted in this 
book. 



notable result was the second major research 
paper on economic education. It was written 
by Attiyeh, Bach, and Lumsden (1969), and 
is reprinted below. The authors were able to 
use a preliminary version of the TUCE, a test 
suitable for use as a measuring instrument in 
colleges. TUCE sparked a considerable ex- 
pansion of research activity. The major paper 
in this wave was Saunders' study of the 
lasting effects of economic education, also 
reprinted here. 

In the late 1960s, Keith G. Lumsden edited 
two books (1967; 1970) of papers presented 
at two conferences he organized. They were 
the first books entirely devoted to research 
on the teaching of college economics. 

Another big step was taken in 1969, when 
the semi-annual Journal of Economic Educa- 
tion (JEE) began publication. In publishing 
the JEE, the initial emphasis of the Joint 
Council on Economic Education (JCEE) was 
on encouraging research rather than pro- 
viding an outlet for worthy studies that 
might not otherwise get into print. Ulti- 
mately the JEE did elicit an enlarged body of 
worthwhile research, though progress was 
slow at first. 

In 1973, we thought the time might be ripe 
for a book of reprints of the best work done 
so far. We drew up a proposed table of con- 
tents and showed it to friends, but their reac- 
tion discouraged us from proceeding. At 
about the same time,, the editor of the Jour- 
nal of Economic Literature (JEL) commis- 
sioned us to write an article surveying the lit- 
erature devoted to research on teaching col- 
lege economics. Fortunately, both the book 
of readings and the survey were delayed, for 
in the next five years the sophistication of 
the research increased markedly. As we 
wrote the survey — the first article reprinted 
below — we became convinced that the time 
had come to publish the volume of readings 
we had proposed several years earlier. The 
dates of the material reprinted in this book 
testify to what had happened to the quality 
of research in economic education in the in- 
terim: of the 28 papers included here, 19 
were published after mid-1975. 

ii 

Neither in our survey article nor in this 
book do we cover the literature on precol- 
lege economic education. There are several 



reasons. Research on economic education is 
carried out primarily by economists in ccl- 
leges and universities; consequently it tends 
to be about education at such institutions. 
Our own knowledge is primarily about those 
levt 1 ^. The readers in whom we can expect to 
arouse 'ne most interest are college teachers. 
And in any event, keeping up with the col- 
lege and university literature these days is a 
job in itself. Those interested in the pre- 
college literature should consult the survey 
by Dawson (1976). Earlier surveys were con- 
ducted by Learner (1950), Fels (1969), Lewis 
and Orvis (1*71), and Wells (1974). 

Our survey article in reality constitutes 
the introduction to this book. It summarizes 
and discusses most of the papers selected for 
inclusion and obviates the need for us to s^y 
anything more about them. We have, how- 
ever, included a few items we did not, for 
one reason or another, mention in the sur- 
vey. William F. Barnes's "Test Information: 
An Application of the Economics of Search," 
as do most of the other papers, provides 
answers to the question raised by Burton F. 
Weisbrod, viz, Why should economists do 
the research on economics education rather 
than leaving the subject to psychologists and 
educationists? Robert J. Staaf's "Student Per- 
formance and Changes in Learning Technol- 
ogy in Required Courses" gives some of the 
results reported in a book we discuss in our 
survey. "Student-to-Student Tutoring in 
Economics," by Allen C. Kelley and Caroline 
Swartz, and 'Textbooks and the Teaching of 
Economic Principles," by Marion Meinkoth, 
are relevant to our subject although they 
could not conveniently be fitted into the 
survey. 

There are four sections below. The first, 
"Survey of the Literature," consists of two 
papers: our survey for the JEL and a paper 
by one of us, Siegfried, originally written as 
part of our survey but on.itted because of 
space constraints. Section II, "Critiques of 
Research Methodology," contains five com- 
mentaries on research in economics educa- 
tion. Section III, "Research Foundations," in- 
cludes a paper on educational production 
functions by Eric A. Hanushek. Although it 
is not directly concerned with the teaching of 
economics in college, it is valuable for 
anyone concerned with research on the 
topic. Section III also includes two papers on 
multiple choice tests in general and the 
TUCE in particular. Section IV, "Research 
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Reports/ makes up about half the book. 
These reports present substantive findings 
on which we believe and hope that future in- 
vestigators can build their work. 
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contributions, the Journal of Economic 
Education has completed almost a decade 
of publication, and some economists re- 
gard the field as their primary specialty. 2 
This survey is as diverse as the research 
it reports, but several themes recur. (1) 
In preparing this paper, we became im- 
presses by the usefulness of the tools of 
economic theory and econometrics for an- 
alyzing teaching problems. Burton F. 
Weisbrod has raised the question: Why 
should economists investigate teaching 
problems at all? Why uot leave the subject 

1 In the 1978 Directory of the American Economic 
Association* 45 economists indicated that economic 
education was their ^.jnary specialty {4, 1978, py. 
426-31]. 

SOURCE: Journal of Economic Literature, vol. 17, no. 3, Stptsmber 1979, pp. 923-969. Reprinted with permission of 
thepuWkher. n ... ^ 



I. Introduction 

DURING the past fifteen years, an exten- 
sive research literature has devel- 
oped on the subject commonly called eco- 
nomic education (not to be confused with 
the economics ^/education). 1 Economists 
with established reputations have made 



1 The economics of education is concerned pri- 
marily with the benefits, costs, production, and fi- 
nancing of die dissemination of knowledge of any 
and •& subjects. Economic educatk»huaimi]ar con- 
cern regarding the ctoeminatkm of knowledge 
about Ilia atngle subject of economics. Most of the 
research evaluates alternative strategies for teaching 
economics successfully. The field is sometimes called 
economic* education* a term that la better English 



Journal of Economic Literature, Vol XVII (September 1979) 



to educational specialists and psychol- 
ogists? 3 [171, 1979]. The most telling 
answer lies in the nature of the work al- 
ready done. The economist brings to re- 
search on teaching certain skills and ways 
of thinking indispensable for sophisticated 
investigation of the subject. (2) Another 
theme is diversity. Different students 
learn in different ways. A menu of tech- 
niques may dominate any single one. (3) 
From the contributions of psychologists 
comes the theme that instruction targeted 
at specific objectives may be better than 
instruction not so targeted. (4) We shall 
be concerned with the extent to which 
training and experience lead to better 
teaching, particularly whether training in 
teaching as distinct from training in sub- 
ject matter has anything to contribute. (5) 
We shall also be concerned with the 
search for innovative teaching methods 
that work. 

A. Omissions 

The present survey concentrates on col- 
lege teaching, particularly the elementary 
course (because most of the research has 
been on it). We pay only slight attention 
to high school economics and none to pre- 
high school teaching, graduate instruction 
(except for training of college economics 
teachers), and informal economics edu- 
cation. 4 With minor exceptions, we con- 
fine ourselves to literature published in 

3 A prior question is whether fruitful research on 
such problems is possible by anybody. Casual empin- 
cis*.i by the older of us suggests that 20 years ago 
large numbers of economists and faculty members 
in liberal arts colleges generally would have an- 
swered with a resounding no. It is likely that many 
of them still feel that way. The proper answer is 
that any important activity like college teaching is 
appropriate for research. The only questions are 
when a subject becomes ripe for fruitful investiga- 
tion and whether the likely results justify the cost. 
Thirty or forty years ago an answer of "not yet" 
was plausible , but no longer. 

4 Those interested in precollege economic educa* 
tion are referred to Dona* J R. Wentworth, W. Lee 
Hansen, and Sharryl H. Hawke [175, 1977] and W. 
Lee Hansen et al [59, 1977]. 



economics journals and books by econo- 
mists. Most of the w *> k is in the Journal 
of Economic Education and the annual 
Papers and Proceedings of the American 
Economic Association. We focus on re- 
search findings in contrast to descriptions 
of courses, teaching methods, and curric- 
ula. 

We have omitted certain topics partly 
due to space constraints and partly be- 
cause they do not fit well with the main 
focus of the survey. We regret the omis- 
sion of radical economics. Radical econo- 
mists have criticized orthodox economics 
teaching severely and have published dis- 
cussions of alternative approaches. 5 But 
to the best of our knowledge, they have 
not done enough research on the results 
to warrant inclusion here. For similar rea- 
sons, we have omitted discussion of the 
experimental 'ementary courses spon- 
sored by the joint Council on Economic 
Education (JCEE). 6 Also omitted is a dis- 
cussion of sex differences in economics ed- 
ucation, which would report that females 
enter college at a disadvantage (appar- 
ently for cultural reasons), which tends to 
persist but not to worsen during introduc- 
tory economics. 7 

Three decades ago a subcommittee of 
the American Economic Association re- 
ported that "the content of the elemen- 
tary course has expanded beyond al! possi- 
bility of adequate comprehension and 
assimilation by a student in cne year of 
three class hours a week" [162, Horace 
Taylor, 1950, p. 56]. This kind of criticism 

s The first draft of this survey discussed Richard 
C. Edwards and Arthur MacEwan [41, 1970]; Mi- 
chael Zweig [179, 1972]; John G. Gurley [56, 1975]; 
Michael Mef . >ol [105, 1975]; and Robert Buchele 
and William Lazonick [26, 1975]. See also Martin 
Bronfenbrenner [24, 1970, p. 765]. 

6 See Kenneth and Elise Boulding [22, 1974]; Ren- 
digs Fels [49, 1974]; Richard H. Leftwich and Ansel 
M. Sharp [89, 1974]; Phillip Saunders [133, 1975]; 
and Barbara and Howard Tuckman [165, 1975]. 

7 For a detailed survey of the literature evaluating 
the relationship between sex and learning economics 
see Siegfried [142, 1979], which was a portion of 
the first draft of this article. 



Siegfried and Fels: Teaching College Economics 



has been repeated again and again. 8 Ac- 
cording to George Stigler, "The watered- 
down encyclopedia which constitutes the 
present course in beginning college eco- 
nomics does not teach the student how 
to think on economic questions. The brief 
exposure to each of a vast array of tech- 
niques and problems leaves the student 
with no basic economic logic with which 
to analyze the economic questions he will 
face as a citizen. The student will memo- 
rize a few facts, diagrams, and policy rec- 
ommendations, nd ten years later will be 
as untutored in economics as the day he 
entered the class" [159, 1963, p. 657]. 
Such criticism is based on an assumption 
about what the typical elementary course 
is like. The assumption may be true, but 
has never been verified. We shall say no 
more about it. 

Whereas the introductory course has 
been the subject of much discussion, criti- 
cism, and research, the undergraduate 
major has drawn little attention. An ex- 
ception is David Hartman's study [64, 
1978; 65, 1978J. 9 According to Hartman, 
the goal implicit in the general examina- 
tions given to undergraduate majors at 
Harvard is skill in micro theory, macro 
theory, and the use of oach "to answer 
real world questions" [64, 1978, p. 13] (see 
also [65, 1 J78, p. 87]). He concluded that 
the introductory course is the high point 
of the undergraduate major at Har-ard 
and that the conventional course in inter- 

• For other examples of the standard criticism, see 
Fels [46, 1955, 49, 1974], George L. Bach [11, 1967, 
12, 1976], Leftwich [88, 1976], Gurley [56, 1975], 
Hansen [57, 1975], Leftwich and Sharp [89, 1974], 
Saunders [133, 1975], and Barbara and Howard 
Tuckman [165, 1975]. 

•Other recent exceptions, albeit with little re- 
search content, are David Morawetz [107, 1976, If rt 
1978] and Bronfenbrenner [25, 1978]. According to 
Morawetz, "The aim of an economics education is 
to train students to become economists" [107, 1976, 
p. 1]. Such a goal seems irrelevant to the vast major- 
ity of undergraduates in American colleges and uni- 
versities. Bronfenbrenner says, "I prefer economics 
to become more frequently a 'junior Ph.D. major' " 
[25, 1978, p. 22] 



mediate micro theory "does not appear 
to have much impact on either students' 
knowledge of micro theory or their ability 
to apply it" [65, 1978, p. 90]. He found 
some evidence that a policy-oriented 
macro course has an impact on what stu- 
dents know about macro theory, but it is 
not strong enough to pass a significance 
test at conventional levels of acceptance. 

B. The Production Function for 
Economics Education 

In ihe rest of the survey, we orga- 
nize the massive literature on teaching 
methods and techniques around the typi- 
cal components in production function 
analysis. Production function studies can 
help determine whether or not economics 
instruction is efficient. This has important 
policy implications, especially for times of 
austere instructional budgets, since ineffi- 
ciency means that it is possible to increase 
school outputs without additional inputs. 10 
The vast majority of the research on eco- 
nomics education has been concerned 
with evaluating teaching methods. The 
implied theoretical model is a production 
function that shifts as a consequence of 
a particular technique. 11 The conceptual- 
ization of these empirical studies, how- 
ever, has been disorganized. 

The absence of established theory 
of learning has made the specification o c 

10 There is, however, a fundamental statistical and 
interpretive issue: Most production function studies 
are based on observed behavior, in which case the 
estimated relationships will not map the production 
frontier unless all p r oducers are generating maxi- 
mum output for given inputs. Ironically we are usu- 
ally engaged in such analyses because we suspect 
that production is not occurring on the frontier. 

11 Most of the difficulties in modeling and estimat- 
ing economics education production functions are 
common to the specification and estimation of gen- 
eral educational production functions However, in 
the latter case more attention has been devoted to 
the impact of sociodemographic characteristics than 
to teaching technologies on school performance 
Comprehensive surveys of this literature are in- 
cluded in Elchanan Cohn [32, 1979] and Richard 
Murnane [112, 1975]. 
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production functions difficult. In general, 
linear additive models have been speci- 
fied. There has been little concern with 
such issues as simultaneity (if student in- 
terest is an output, does it not feed back 
into cognitive understanding?), functional 
form (are there interactions among the in- 
dependent variables?), and the statistical 
techniques employed (many dependent 
variables are limited, even dichotomous, 
making ordinary least squares regression 
analysis inappropriate). The studies have 
usually arisen from an instructor's initia- 
tive to try a new teaching technique and 
typically involve a comparison of the 
course before and after or with and with- 
out the new innovation. Only a limited 
number of the studies have used broad- 
based samples, and very few have consid- 
ered whether the benefits are worth the 
costs. 

To evaluate innovative instructional 
techniques it is necessary to hold other 
things, especially the level of inputs, con- 
stant. William I. Davisson and Frank J. Bo- 
nello [40, 1976] describe a useful taxon- 
omy for organizing research on the 
production function for learning college 
economics. Their approach is superior to 
the ad hoc theorizing that characterizes 
most of the economics education litera- 
ture. Davisson and Bonello identify three 
separate categories of inputs: human capi- 
tal (SAT scores, grade point average, and 
pretest scores); utilization rates (time 
spent on the course by students); and tech- 
nology — the efficiency with which effort 
is transformed into cognitive achievement 
(lecturer effectiveness, text effectiveness, 
etc.). This distinction draws attention to 
the potentially defective conclusions that 
may arise from simple comparisons of test 
scores between control and experimental 
groups, or even a comparison of control 
and experimental groups' performance 
that controls for human capital character- 
istics. A shift in the production function 
as a consequence of some alternative tech- 



nology can be detected correctly only if 
input utilization rates are held constant. 
Otherwise, performance comparisons 
consist of output at different levels of in- 
puts on potentially different production 
functions, and it is impossible to disaggre- 
gate the effect of changes in the level of 
inputs from changes in the rate at which 
inputs are transformed into outputs. 

The production function involves multi- 
ple outputs as well as multiple inputs. Ac- 
cording to Judith Yates, "the objectives 
of the education system must always be 
borne in mind [in research on economics 
education]. These objectives are often ill- 
defined and may vary significantly among 
institutions. ... At the risk of overlook- 
ing important ones, a number of objec- 
tives can be listed: students' growth and 
development with regard to skills (which 
can include application, critical thinking, 
creativity, motor skills, etc., as well as com- 
prehension and understanding); their so- 
cial development (including leadership, 
communication, interpersonal relations, 
etc.); their acquisition of vocational skills, 
and so on." Furthermore, "The contribu- 
tion of a particular technique or teaching 
method to the process of learning how to 
learn rather than to the specific output 
of precisely what was learned may be its 
most important attribute" [178, 1978, p. 
13]. Other outputs are increased sophisti- 
cation with respect to values and attitudes 
and changes in ideological orientation. 
Colleges also typically serve a screening 
role for graduate schools and the labor 
market. 

II. Measuring Outputs 

Those auing research on college teach- 
ing ol economics are only beginning to 
come to grips with the multiplicity of ob- 
jectives. In practice, they have measured 
output in three principal ways: examina- 
tions (usually objective tests), student eval- 
uation questionnaires, and a kind of mar- 
ket test using number of majors attracted 
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or enrollment. The unsolved problem of 
measuring all the outputs and assigning 
weights tc ihem has been pointed out by 
Yates and others as limitations on the re- 
search done. A related problem, which has 
been receiving attention in rece^ years, 
arises from the fact that different teaching 
methods have different impacts on differ- 
ent students, so that the distribution of 
outputs needs to be measured. This sec- 
tion is organized around five dimensions 
of output: cognitive performance, student 
attituJes toward the subject and the edu- 
cational process, the impact of under- 
standing and attitudes on subsequent be- 
havior, changes in values, and the 
distribution of benefits. 

A. Measurement of Learning 

The bulk of research on college teach- 
ing of economics is based on the implicit 
assumption that "the major criterion or 
objective of economic education is a very 
narrowly defined concept of learning, re- 
lated in some way to student performance 
as measured by achievement or course 
grade" [178, Yates, 1978, p. 12]. Availabil- 
ity of nationally normed, validated objec- 
tive tests from which the data can be ana- 
lyzed with standard statistical tools has 
encouraged this procedure. 

A number of objective tests are avail- 
able for research on economics education. 

(1) The Test of Economic Understanding 
(TEU) is a 50-item multiple choice test of 
what every high school graduate should 
know about economics [71, JCEE, 1963]. 
It has recently been revised [154, John 
C. Soper, 1979]. The new version is called 
the Test of Economic Literacy (TEL). 111 
For a time the TEU was used to measure 
learning in the college elementary course, 
a purpose for which it is unduly simple. 

(2) The elementary economics test in the 
Educational Testing Service's College 

"The TEL is meant for grades 11 and 12. The 
Joint Council on Economic Education, its publisher, 
also has tests for grades 2-3, 4-6, and 7-9. 



Level Examination Program is a 90-rnin- 
ute 100-item multiple choice test de- 
signed to enable colleges to determine 
whether transfer students and students 
who have learned economics on their own 
are entitled to credit for the elementary 
course and /or admission to advanced 
undergraduate courses. It concentrates 
heavily on theory. Separate micro and 
macro versions are available. Because of 
high cost and tight security regulations, 
it has been used for only one research pro- 
ject that we know of, namely John J. Sieg- 
fried [141, 1977]. (3) By far the most 
widely used instrument is the Test of Un- 
derstanding in College Economics (TUCE) 
[72, JCEE, 1968], which has been used 
in at least 71 studies. 13 For other discus- 
sions of the TUCE, see Fels [48, 1970], Ar- 
thur L. Welsh and Fels [174, 1969], Dar- 
rell R. Lewis and Tor Dahl [91, 1971], and 
T. R. Swartz, F.J. Bonello, and W. I. Davis- 
son [161, 1977]. (4) The Test of Economic 
Comprehension (TEC), which is similar to 
the TUCEin content and construction, has 
been used mainly in Great Britain [9, 
Richard Attiyeh and Keith Lumsden, 
1971]. 

The TUCE is in two parts, each of which 
has two alternative forms of 33 multiple 
choice questions. Part I is intended to 
cover the content of the typical first se- 
mester of the college course, ie., macro- 
economics plus some basic micro con- 
cepts. Part II is intended for the typical 
second semester: the theory of the firm, 
marginal productivity analysis, interna- 
tional economics, and comparative sys- 
tems. 14 Since the two forms for each part 
are approximately equal in difficulty, one 

13 For this information, we are indebted to John 
Vahaly. 

14 Supply and demand analysis is included in both 
Part I (bare elements) and Part II. In addition to 
the four forms described above, a •"hybrid" test con- 
sisting of 33 questions selected from both Part I and 
Part II was devised for Saunders's lasting-effects 
study discussed below. See Saunders and Welsh [136, 
1975] and Saunders [129, 1971; 132, 1973]. 
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can be given at the beginning of the se- 
mester, the other at the end, to measure 
the amount of value added. Although in 
subject matter content the specifications 
were meant to conform to the typical col- 
lege course, they depart from it with re- 
spect to kinds of questions, having approx- 
imately one-third each in the categories 
of recognition-and-understanding, simple 
applications, and complex applications. 
(See Fels [47, 1967] and Welsh and Fels 
[174, 1969]). The committee responsible 
for the test hoped to influence economics 
instructors to put more emphasis on appli- 
cations and less on memorization and ab- 
stract theory. National norms were estab- 
lished using data from 50 colleges. 

The heavy dependence of economics 
education research on the TUCE calls for 
appraisal of its strengths and weaknesses. 
On the plus side, the quality of the test 
is considerably higher than the tests that 
can be constructed by individual research 
workers. 15 Constructing a reasonably sat- 
isfactory test is enormously difficult and 
expensive. The existence of matched pairs 
of tests, making possible measurement of 
value added in controlled experiments, 
and the availability of norming data for 
comparison purposes are legitimate rea- 
sons for the popularity of the TUCE. It 
was tested for validity [91, Lewis and 
Dahl, 1971] and found to be an effective 
discriminator of students with high and 
low levels of ability and a good measure 
of prior ability and analytical skills [28, 
Stephen G. Buckles and Welsh, 1972]. 
Lewis and Dahl found that its simple ap- 
plication questions were correlated with 

15 Despite the widespread use of multiple choice 
questions for grading purposes, most economists arc 
not familiar with the elementary principles of test 
construction, e.g., that specifications ft* the test must 
be drawn up in advance, that questions need to be 
pretested on seven or eight hundred students, that 
the right response must be shown in the data to 
be chosen more often by good students than by poor 
students and vice versa for the dis tractors, and that 
questions need to be edited by a psychometrician 
(among othei reasons to avoid tipofiFs to testwir* stu- 
dents). For details, see Fels [48, 1970]. 



critical thinking skills as measured by the 
Watson Glaser Critical Thinking Ap- 
praisal Test 

The shortcomings of the TUCE, how- 
ever, are considerable. In the first place, 
no general purpose test is likely to con- 
form exactly to the purposes and content 
of any particular course. In the second 
place, not all the questions are satisfactory. 
(The TUCE is currently undergoing revi- 
sion to replace unsatisfactory and out- 
dated questions.) In the third place, no 
multiple choice test can measure all the 
objectives of elementary instruction. 16 

The correlation between scores on the 
best possible multiple choice test and 
grades on essay examinations varies ac- 
cording to the subject — high for mathe- 
matics, low for English literature, with 
economics an intermediate case. We do 
not know of any study di r ected at the spe- 
cific question of how TUCE test scores cor- 
relate with essay tests built to the same 
specifications. H. Tuckman found a nega- 
tive simple correlation between a ten- 
question version of the macro TUCE and 
course grades [166, 1975J P. F. Labinski 
at a community college found an r 2 of .22 
between the TUCE and course grades. He 
argued that the objectives of economics 
instruction at two-year colleges differ 
from four-year colleges [86, 1978]. Inas- 
much as the TUCE was designed to mea- 
sure not so much the objectives of four- 
year colleges themselves but what a com- 
mittee thought those objectives should be, 
it is not surprising that two-year colleges 
with still other objectives would find the 
TUCE a poor measure of thei*" output. 
Elisabeth Allison, in a Harvard Working 
Paper that estimates production functions 
for economics education, reports correla- 
tions ranging over a three-year period 
from .74 to .85 between scores on a 60- 
minute, 40-question multiple-choice test 

ia T R. Swartz, Bonello, and Davisson have docu- 
mented the shortcomings of the TUCE as a measure 
of cognitive achievement for Notre Dame [161, 
1977]. 
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(parts of which are from TUCE) and a two- 
hour essay portion of the same three-hour 
final examination at Harvard [3, 1977, p. 
15]." 

Future progress in economics education 
will require a computerized national test 
bank consisting of thousands of carefully 
edited multiple choice questions with data 
proving that they "work." The questions 
need to be classified not only by subject 
and type of question but also by kind of 
course or courses for which they are useful 
(high school, elementary college, interme- 
diate theory, etc). 

The availability of two matched forms 
of the TUCE has led to pre- and post- 
course testing, which permits several 
forms of the output measure to be speci- 
fied: (1) absolute achievement— the post- 
test score; (2) absolute improvement — the 
difference between the post-test and the 
pre-test score; (3) percentage improve- 
ment — absolute improvement divided by 
the pre-test score; and (4) gap closing 
measure — absolute improvement divided 
by the potential gain in score (which is 
the difference between the perfect score 
and the pre-test sco*e). The absolute 
achievement score reflects the level of un- 
derstanding at a point in time. It is a stock 
measure. The absolute improvement 
score measures the increment of learning 
during a course. 1 d An alternative is to use 

17 The major study in the United Kingdom (The 
Economics Education Project) carried out between 
1969 and 1973 analyzed the correlation Detween 
TEC scores and grades awarded at both university 
and high school levels. At high school level the /P 
between final grades as awarded by the Examination 
Boards varied from .05 to .26 (sample sizes: 1970, 
4.254 students; 1973, 3,335 students). At university 
level the /P was significantly greater than zero (rang- 
ing from .05 to .49 for 27 out of 38 courses with a 
total of 2,823 students. The mean was .12. (We 
are indebted to Alex Scott for this footnote. These 
results will be in a forthcoming book by Lumsden, 
Attiyeh, and Scott [97]. 

u Failure to consider explicitly whether the stock 
or the flow is to be measured is a common fault. 
For instance, most studies of economics education 
include sex as a binary variable but are not explicit 
about whether it is the stock or the flow that is associ- 
ated with sex. Studies implicitly concerned with the 



the past-test achievement score and con- 
trol for initial economic understanding by 
including the pre-test score as an inde- 
pendent variable in the regression analy- 
sis. The percentage improvement and 
gap-closing measures were developed be- 
cause there may not be a constant 
difficulty of learning throughout the spec- 
trum from total ignorance to total mas- 
tery; thus any aggregate measure of suc- 
cess should measure improvements by 
students on the basis of the difficulty of 
achieving them [177, Simon Whitney, 
1960], The percentage improvement 
form implies that it is more difficult for 
poorer students to improve their scores 
by a given absolute amount. The gap-clos- 
ing measure implies the opposite, that it 
is more difficult to improve a score by a 
given amount if one starts at a higher level 
of mastery (because only the most difficult 
material remains to be mastered) [155, 
Soper and Richard Thornton, 1976, pp. 
86-87]. 

At first glance absolute improvement, 
percentage improvement, and gap-clos- 
ing measures of output may appear sub- 
ject to ceiling or floor effects. Random 
guessing would generate about a 25 per- 
cent score on a multiple choice test with 
three distractors in each question. Since 
the average score in the TUCE national 
norm was 57 percent, floor effects seem 
more likely than ceiling problems. Wil- 
liam Becker and Michael Salemi [18, 
1977] used a nonlinear model to deal with 
ceiling effects. More research is required 
on this issue because the existence of an 
upper or lower boundary tends to make 
the dependent variable inappropriate for 
ordinary least squares estimation. 

stock generally report positive correlations between 
end-of-course student performance and male sex. 
Studies of the flow generally do not. Only when the 
distinction between stock and flow became recog- 
nized was the correct conclusion drawn: the evi- 
dence shows only that females start the introductory 
economics course at a disadvantage, not that they 
are at a disadvantage in learning economics during 
the course. 
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Incentives seem to make a difference. 
Sometimes tests used to measure output 
in an experiment count toward grades, 
sometimes not. William Wehrs found that 
the greater incentives of counting the 
TUCE when used as a post-test made a 
12 percent difference in absolute score 
(and a gain difference of 40 percent) after 
holding constant the pre-TUCE score, 
high school rank, age, and sex [168, 1978]. 
Only a quarter of the instructors partici- 
pating in norming the TUCE used the 
post-test results for grading, biasing the 
norms downward. But the scores used for 
the pre-test norms did not count toward 
grades at all, biasing improvement up- 
ward. 

Most learning technology improve- 
ments have been adopted in introductory 
courses. Richard McKenzie and Robert 
Staaf argue that introductory courses, 
w iich are frequently part of "distribution 
requirements," are likely to be inferior 
goods [104, 1974, pp. 30-33]. The intent 
of introducing improvements in teaching 
technology may be to induce a substitu- 
tion effect toward the discipline, but there 
is also an income effect associated with 
the relative price change (in terms of re- 
duced student effort to achieve a fixed 
learning or grade level) brought about by 
improved technology in a single disci- 
pline. The benefits from improvements in 
teaching introductory economics may 
manifest themselves in released time de- 
voted to studying other subjects or con- 
sumed in leisure. Allen Kelley found em- 
pirical support for this proposition [83, 
1975]. If this occurs, the technology 
change may be deemed successful unless 
we value the alternative uses of time at 
zero. 

B. Measurement of Student Attitudes 

Student evaluations of courses and pro- 
fessors have come into widespread use in 
American colleges and universities over 
the past decade. Three types of student 



attitudes are likely to be of interest: (1) 
attitudes toward policy issues, (2) attitudes 
toward economics, and (3) opinions re- 
garding the quality of instruction. The first 
will be considered in the subsection on 
values. The second was studied by David 
Ramsett, Jerry Johnson, and Curtis Adams 
using data from three small midwestern 
universities [122, 1973]. They found that 
those students who were more favorably 
disposed toward the subject of economics 
did better on the post-course TUCE, hold- 
ing pre-7T/CEand sociodemographic fac- 
tors constant. Lewis Karstensson and 
Richard Vedder confirmed these findings 
with a similar analysis of Ohio University 
students, finding that students whose in- 
terest in economics grew the most during 
the semester also learned the most [74, 
1974]. However, there was no attempt to 
control for study time. 

The principal controversies surround- 
ing student evaluations of teaching arise 
from the various uses to which they can 
be put. Some have argued that course 
evaluations are undesirable because they 
do not adequately represent student cog- 
nitive achievement. Miriam Rodin and 
Burton Rodin, using 12 calculus classes, 
reported that students who learned the 
most rated their instructors the worst 
[124, 1973]. Similar findings were re- 
ported by Dennis Capozza, who regressed 
student ratings of instructors on the in- 
crease in TEU scores and the average 
grade of eight different economic classes 
[29, 1973]. He found that higher course 
evaluations were associated with higher 
grades and less gain in TEU 'score. Studies 
by Soper, using data from the University 
of Missouri [151, 1973], and by Attiyeh 
and Lumsden, using a sample of approxi- 
mately 4,700 British students [9, 1971], 
yielded similar results. 19 

10 Both the Rodin and Rodin [124, 1973] and the 
Soper [151, 1973) studies were of large lecture 
classes with discussion groups led by graduate teach- 
ing assistants The instructor evaluations were of the 
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On the other side of the issue are data 
from Saunders [131, 1972], which appear 
to support the validity of student evalua- 
tions as measures of cognitive achieve- 
ment in three of four tests, and from Peter 
Sloane [149, 1972]. A more recent study 
by W. Douglas Morgan and Jon Vasch6 
[109, 1978] also concludes that students 
can and do recognize good teaching. They 
estimated the marginal product of various 
teaching assistants (TA's) in a large intro- 
ductory-level macroeconomics course at 
the University of California, Santa Bar- 
bara, by regressing final student grade 
performance on typical control variables 
and binary variables for specific teaching 
assistants. They then correlated the stu- 
dent course evaluations of the TA's with 
their impacts on final course gri» Jes, which 
the TA's did not control. There was a per- 
fec rank correlation of evaluation scores 
witn the estimated marginal products on 
(1) preparation for class, (2) communica- 
tion skills, and (3) ability to respond to 
questions. Ratings on knowledge of the 
subject were positively but not perfectly 
correlated with marginal products. 

Douglas Needham has reconciled the 
conflicting empirical relationships be- 
tween learning, course evaluations, and 
expected grades [113, 1978]. Using a 
model of student time allocation in which 
students equate the marginal grade pro- 
ductivity of time devoted to each activity, 
Needham shows that the theoretical ex- 
pectations relating grade levels to course 
evaluations and relating student learning 
to course evaluations are ambiguous be- 
cause student time allocation decisions de- 
pend on the rate of transformation of ef- 



teaching assistants. KeUey found that course evalua- 
tions were not strongly influenced by the perfor- 
mance of individual teaching assistants [81, 1972]. 
If this is correct, then lower evaluations of assistants 
whose rodents did worse on cognitive achievement 
may be spurious, the major effect on their perfor- 
mance arising from the lectures and other differ- 
ences among them, which were not controlled in 
the Rodin and Rodin and Soper studies. 



fort into learning and the rate of trans- 
formation of learning into grades, rather 
than the absolute levels of learning and 
grades. 

The debate on whether student course 
evaluations measure cognitive achieve- 
ment well may overlook the multiple di- 
mensions of teaching. If parties in the in- 
structional process value the various 
dimensions differently, there may well be 
reason for an assessment of instructional 
characteristics aside from cognitive 
achievement. The resolution of this ques- 
tion then hinges on the assignment of val- 
ues to different aspects of educational out- 
put. Some of the parties to the process 
may value "entertainment" highly and 
desire it to be a part of their education. 
Even if this is what course evaluations as- 
sess, they may be warranted. 20 

Students' assessment of instruction can 
be obtained either through systematic 
course evaluations, informal student opin- 
ion, or carefully structured intensive in- 
terviews. Although systematic course 
evaluations have the advantage of a larger 
sample, they may be subject to greater 
measurement error than intensive inter- 
views because students feel less responsi- 
ble in responding. But since systematic 
course evaluations are usually anonymous, 
they might be less biased. 

The interpretation of student evalua- 
tions is also important. By far the most 
controversial issue is whether instructors 
can "buy" higher evaluations by lowering 
the (effort) price to students of achieving 
a given grade. McKenzie [102, 1975] and 
Paul Kipps [83, 1975] provide theoretical 
analyses employing price theory. McKen- 
zie shows how an instructor can increase 
the average course grade by a parallel 

10 However, Frank Costin, William Greenough, 
and Robert Menges in their comprehensive assess- 
ment of student course evaluations (mostly in teach- 
ing psychology) conclude that "existing data ... do 
not permit a conclusion that sheer •entertainment* 
is what makes students perceive a teacher as a 'good* 
one* [35, 1973, p. 51]. 
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shift in the leisure-grade transformation 
curve, a shift favoring the better students, 
or a shift favoring the poorer students. If 
course evaluations improve as students do 
better in the class relative to their expec- 
tations, a relaxation of grading standards 
is consistent with either a positive, neu- 
tral, or negative correlation '*ith individ- 
ual student course evaluations. If the 
worst students are the ones who are pleas- 
antly surprised by grading standards, they 
will rate the instructor higher than ex- 
pected and, ceteris paribus, create a nega- 
tive overall correlation between evalua- 
tions and grades for the course. But if 
instructors "buy'* evaluations by raising 
grades, the positive correlation should ap- 
pear in cross-course comparisons [76, 
James Kau and Paul Rubin, 1976]. 

These expectations seem to be borne 
out in the empirical studies. Kelley re- 
gressed course and professor evaluations 
on expected course grade, student ability, 
and other standard control variables for 
258 students at the University of Wiscon- 
sin [80, 1972]. He found that the coeffi- 
cient on students* expected course grade 
was significantly greater than zero, but its 
magnitude was small. It would have taken 
an enormous change in the instructor's 
overall grading standards to generate a 
trivial movement in his course evaluation 
scores. 

Studies of the association between per- 
ceived grading standards and course eval- 
uations across classes have found the ex- 
pected relationship— easier grading is 
positively correlated with evaluations. In 
an analysis of 339 sections of social science 
courses t at Central Michigan University, 
Alan Nichols and Soper found that stu- 
dents* expected course grade was a signifi- 
cant (both statistically and practically) de- 
terminant of course evaluations, holding 
course environment characteristics con- 
stant [115, 1972]. Class size and course 
level were not related to course evalua- 
tions. Rolf Minis [106, 1973], in a regres- 



sion study of 122 courses at the University 
of Alberta, confirmed the Nichols and 
Soper findings. He found that expected 
course grade was significantly related to 
courre evaluations, but class size, type of 
course, the evaluation response rate (frac- 
tion of enrolled students who completed 
the evaluation), or whether the course was 
required did not influence evaluations. 
According to Mirus's findings, "a professor 
who, compared to his colleagues, makes 
the class expect a 1 point higher grade 
can improve his own evaluation by .85 of 
a point'* [106, 1973, p. 36]. Both course 
evaluation and student grade were mea- 
sured on a five-point scale. In a study of 
201 classes in the College of Business Ad- 
ministration at the University of Georgia, 
Kau and Rubin reported similar findings 
[76, 1976]. Class level, required course, 
and expected grade were all statistically 
significant determinants of course evalua- 
tions. The average class grade point aver- 
age, class size, and the percent of majors 
in the course had no effect. 

The major problem in interpreting the 
studies that find a st. ustically significant 
positive association between course 
grades and students* evaluations of their 
instructor and between cognitive achieve- 
ment and students' evaluations is sorting 
out cause and effect. If better instructors 
tend to emphasize the kind of higher 
learning that students are likely to pro- 
test (as costly to acquire), and if the stan- 
dard tests of cognitive achievement fail 
to pick up these effects, the better in- 
structors might have classes that do poorly 
on standardized tests and give low rat 
ings. 21 This, however, is speculation. 

ai For a contrary view, namely, that student evalu- 
ations will encourage instructors to teach lower level 
cognitive material {e.g., basic facts and concepts with 
applications to narrowly defined problems, rather 
than analyzing, evaluating, and synthesizing broad 
real-world problems), see Michael Everett [45, 
1977]. He argues that because student evaluations 
seem to depend mainly on clarity of communication 
in the classroom, well-prepared lectures, and instruc- 
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Other than cognitive achievement (per- 
haps in a negative direction) and grading 
standards, what do student course evalua- 
tions measure? In a careful study of 4,996 
course evaluations from the Graduate 
School of Business at Stanford University, 
Lumsden found that specific characteris- 
tics like clarity of presentation, enthusi- 
asm, and respect for student opinion had 
the largest positive impact on course eval- 
uations [96, 1974]. Sensitivity to a stu- 
dent's needs, interest in the student as a 
person, and availability outside of class 
were unimportant (although one might 
expect different results from undergradu- 
ates). Costin, Greenough, and Menges, 
summarizing many studies of course eval- 
uations, concluded that "such attributes 
as preparedness, clarity, and stimulation 
of students* intellectual curiosity were 
typically mentioned by students in de- 
scribing their best instructors" [35, 1973, 
p. 52]. 

The only widely used student evalua- 
tion form is the Purdue Rating Scale for 
College Instructors. It is a questionnaire 
consisting of 28 items. Each student rates 
the instructor from A to F on five broad 
categories of questions — personal charac- 
teristics (e.g., patience and understand- 
ing), objectivity (e.g, willingness to listen 
to and talk about divergent ideas or view- 
points that oppose his/her own), exposi- 
tion skills {e.g., concise presentation), tests 
and grading (e.g., impartiality), and sub- 
ject matter knowledge. 22 Most evaluations 
of instruction are made with the use of 
locally designed and administered ques- 
tionnaires, making it difficult to compare 
the effects of instructional techniques on 
student assessments of teaching quality 

tor enthusiasm, rather than on stimulating intellec- 
tual curiosity or illustrating how economists analyze 
broad complicated problems, they may encourage 
the teaching of lower level material that can most 
conveniently be taught with those instructional 
skills. 

M For more detail on the Purdue Rating Scale, 
see H. H. Remmers and J. A. Weisbrodt (123, 1965]. 



across schools. The advantage of "home- 
made" course evaluation instruments is 
the greater faculty confidence in results 
of their own handiwork [60, Hansen and 
Kelley, 1973, p. 18]. 

A serious problem with empirical re- 
search on student evaluations of teaching 
involves the transformation of question- 
naire data to numerical indexes appropri- 
ate for statistical analysis. The typical 
course evaluation form provides a scale 
of five alternatives, from mu r h better to 
much worse, on which students are asked 
to rate the particular course or instructor. 
Thus the questionnaires ask for ordinal 
data. Then the data are usually coded on 
a one to five scale and employed in the 
analysis. Raymond Battalio, Joe Hulett, 
and John Kagel have shown that conclu- 
sions based on regressions using data ob- 
tained from an ordinal raling can theoreti- 
cally be reversed by performing entirely 
legitimate order-preserving transforma- 
tions on the measured variables [15, 1973]. 
The real question, however, is whether 
this difficulty is important in a practical 
sense [140, Siegfried, 1973]. Without 
knowledge of the actual underlying cardi- 
nal scale, it is impossible to evaluate the 
seriousness of the problem. 

Student course evaluations have proba- 
bly received so much attention because 
they are frequently thought to influence 
faculty behavior. McKenzie and Staaf 
[104, 1974] and Hansen and Kelley [60, 
1973] have used microeconomic theory to 
demonstrate how faculty might respond 
to changes in the reward structure. Fur- 
thermore, the effect of changes in time 
constraints (say, by varying the course 
load; are ambigjous because income ef- 
fects may swamp substitution effects. 
Becker has shown rigorously that increas- 
ing the weight assigned to high quality 
teaching in the faculty income determina- 
tion process may be ineffective unless im- 
provements are made simultaneously in 
the accuracy of measuring teaching ability 
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[17, 1979]. In a further extension Cliff 
Huang demonstrates that the faculty work 
response to changing relative rewards to 
teaching and research may be ambiguous 
if the production processes for research 
and teaching are less than certain [68, 
1979]. 

In an empirical study of the faculty sal- 
ary determination process at the Univer- 
sity of Wisconsin, Siegfried and Kenneth 
White found that teaching quality (mea- 
sured by course evaluation*) was re- 
warded positively [146, 1973]. The coeffi- 
cient on a variable sensitive to the 
extremes in teaching performance was 
statistically significant at the .10 level. The 
importance of its size depends on the cost 
of improving teaching ratings. David Katz 
reported contrary findings about the re- 
wards for teaching at a large midwestern 
university [75, 1973]. Katz, however, used 
a binary variable, which discriminated 
only between teachers above and below 
the median and consequently did not 
characterize the salary determination sys- 
tem as it was implied in his own interview 
reports, namely, that salaries are sensitive 
to extremes in teaching quality, but re- 
spond very little to differences close to 
the average. Siegfried and White [147, 
1978] have shown that the differences in 
specification — sensitivity to extremes ver- 
sus sensitivity only around the median — 
may cause the difference between Katz's 
findings and theirs. In a comprehensive 
analysis of the determinants of faculty sal- 
aries that was based on data gathered by 
the American Council on Education as 
part of a 1972-73 national cross-section 
of faculty, Howard Tuckman found that 
faculty who had won an award for out- 
standing teaching earned no more than 
others, ceteris paribus [167, 1976]. 

The Siegfried and White study exam- 
ined salary differentials within an aca- 
demic department; Katz's san.ple in- 
cluded faculty from various departments 
within a single university; Tuckman 's data 



base (for the result reported here) con- 
sisted of economists at various institutions. 
These findings can be reconciled if the 
recognition (and reward) for teaching is 
local. For research there are national 
standards and visible evidence of per- 
formance (publication), while teaching 
reputation is usually well known only to 
local colleagues. Thus salary differences 
based on teaclang ability may appear only 
within individual departments. 

C. Impact of Understanding and 
Attitudes on Behavior 

Economists generally measure the 
value of education by the discounted pres- 
ent value of the expected difference in 
earnings streams (net of direct costs) at- 
tributable to it. There are at least two ma- 
jor reasons this approach has been con- 
spicuously absent from the literature of 
economics education. (1) Economics is 
usually offered as part of a liberal arts edu- 
cation. Because it is only a small part of 
a large program of education, it is difficult 
to attribute earnings differentials to spe- 
cific courses in the program. (2) Many 
economists believe there are benefits to 
economics education that individuals can- 
not appropriate at reasonable cost, in 
which case private earnings will not fully 
reflect the total benefits from economics 
education. This frequently used "citizen- 
ship argument" for economics education 
means that economics education raises 
students' sensitivity to the political, eco- 
nomic, and social system of which they 
are a part and increases the intelligence 
with which they participate in it (e.g., by 
voting) [58, Hansen, 1977]. 

The difficulties in assessing public bene- 
fits to economics education have been 
summarized by McKenzie: "Before the 
public benefits . . . can be acquired 
through the political process, the student 
. . . must have sufficient incentive to 
maintain the human capital stock that he 
has acquired. . . . Public choice theory 
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predicts that the typical individual stu- 
dent-citizens will not have sufficient in- 
centive to incur these costs. . . . Finally, 
economic education must overcome the 
tendency of people, in spite of what they 
know about the economic merits of legis- 
lation, to vote their own private interests" 
[103, 1977, p. 10]. 

One approach to assessing subsequent 
student behavior compares the fraction of 
students in an experimental introductory 
course who go on to major in economics 
with the fraction from a conventional in- 
troductory course. The notion is that stu- 
dents who have favorable attitudes toward 
economics and/or believe there is value 
in economic reasoning are more likely to 
major in economics and also more likely 
to capitalize on their economic reasoning 
skills in later life. Unfortunately this mea- 
sure of effectiveness has several disadvan- 
tages: (1) The percentage of students from 
a class who go on to major depends on 
the denominator as well as the numerator, 
and many exogenous factors (such as cur- 
riculum requirements of other, non-eco- 
nomics, major departments) may influ- 
ence the number of students taking a 
course in economics. (2) An introductory 
economics course that provides an accu- 
rate portrayal of what a student might ex- 
pect in future courses in the major may 
(appropriately) dissuade many from ma- 
joring. But this does not mean the course 
is a failure, any more than it means that 
the students' talents and/or interests are 
"bad." A model of student curricular 
choice formulated by McKenzie and Staaf 
[104, 1974] and tested empirically by Alan 
Freiden and Staaf [53, 1973] implies that 
students who are superior in both verbal 
and mathematical ability tend to switch 
majors less frequently than other students 
because they do not have to compromise 
their interests in order to achieve the 
highest grades possible for them. Thus, if 
student enrollment in voluntary experi- 
mental courses is correlated with abilities, 



inferences from the fraction who subse- 
quently major in economics may be con- 
taminated. (3) The advantage of effective 
experimental teaching methods may ac- 
crue to other courses or leisure time, de- 
pending on students' time allocation deci- 
sions. (4) When there are changes in the 
demand for graduates of different disci- 
plines, the relative benefits from majoring 
in certain fields will be altered and induce 
a shift in student selection [52, Richard 
B. Freeman, 1971]. Since the introduction 
of new teaching technologies may accom- 
pany exogenous shifts in demand, there 
is risk of incorrectly attributing "success" 
to a new learning technology if majoring 
in economics is part of the output matrix. 

D. Attitude Sophistication and Values 

One of the goals of economics education 
is greater sophistication about policy is- 
sues. Economics training may also make 
students more conservative or more lib- 
eral. Researchers have attempted to mea- 
sure both effects. 

William R. Mann and Daniel R. Fusfeld 
have argued that "the attainment of a high 
level of attitude sophistication should be 
as much a goal as the proper manipulation 
of supply and demand schedules" [98, 
1970, p. 125]. They developed a Question- 
naire of Economic Attitudes (QEA) and 
concluded that the attitude sophistication 
of an experimental group improved while 
that of a control group without economics 
instruction did not. Mitchell P. Rothman 
and James H. Scott, Jr., criticized the 
measuring instrument as favoring liberals 
[126, 1973]. If they are right, Mann and 
Fusfeld's evidence suggests that econom- 
ics makes students more liberal. Howard 
Tuckman [166, 1975] used a 20-item Atti- 
tude Sophistication (AS) test along the 
lines of Mann and Fusfeld's. He found that 
"an increase of about 3.3 points on the 
pre-AS exam adds one point to the post- 
AS score and 0.17 points to the final 
grade." He attributed the significant cor- 
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relation of the AS score with the grade 
to its measuring "economic reasoning 
rather than prior knowledge" [166, 1975, 
p. 36]. The AS test was not published. If 
it is free of bias, further research with it 
might be worth pursuing. 

The 1973 article by Rothman and Scott 
[126] and another paper by the same au- 
thors [139, Scott and Rothman, 1975] re- 
port the results of administering Q Social 
Opinion Questionnaire (SOQ) to students 
at Carnegie-Mellon. Whereas the QEA 
and the AS were intended to measure 
progress toward increased sophistication 
from studying economics, the SOQ was 
intended to "separate the students along 
a 'liberal-conservative* dimension." In the 
first study published, Rothman and Scott 
found that "conservative answers are al- 
most invariably associated with higher ex- 
pected TUCE scores at the start of the se- 
mester [i.e., before the study of economics 
began]. Equally important is the fact that 
the most significant questions are those 
associated with an individual's prefer- 
ences for the capitalist market system" 
[126, 1973, p. 121]. At the end of the 
course, which included some microeco- 
nomics but emphasized macro, there was 
no significant difference between liberals 
and conservatives on Part I of the TUCE 
(which, like the course, includes some mi- 
cro but emphasizes macro). The authors 
concluded that the SOQ measures some- 
thing other than economic knowledge, 
that conservatives on entering the intro- 
ductory course know more economics 
than liberals, and that there is no system- 
atic bias in the TUCE, Part I, when used 
at the end of the introductory course. 
Since the sample size was only 49 students 
at one school, these conclusions must be 
deemed exceedingly ten ative. 

The second paper throws more light on 
the effect of an introductory psychology 
course than on an introductory economics 
course. Th^ psychology instructors were 
more liberal than their students and influ- 



enced them in a liberal direction. Scott 
and Rothman report that "the college ex- 
perience is a liberalizing one [139, 1975, 
p. 109], but the psychology course had a 
stronger liberalizing effect than the eco- 
nomics course [139, p. 110]. 

Kim Sosin and Campbell R. McConnell 
investigated the effect of an introductory 
macroeconomics course on attitudes to- 
ward income distribution [156, c. 1978]. 
They found a significant shift in attitude 
toward more egalitarianism compared to 
a control group. The largest shifts were 
by students with the least incoming ten- 
dency toward egalitarianism and those 
with the highest grades. Inasmuch as Mil- 
ton Rokeach [125, 1973] has shown that 
in the United States the liberal-conserva- 
tive spectrum can be reduced to prefer- 
ence for equality, the finding of Sosin and 
McConnell that a macro course made stu- 
dents more liberal is convincing even 
though their survey was confined to a sin- 
gle issue. 

In conclusion, it has not yet been estab- 
lished that attitude sophistication is a 
measurable output. On the question of po- 
litical orientation, the evidence cited 
tends toward the conclusion that econom- 
ics has a liberalizing influence. But the hy- 
pothesis that microeconomics gives stu- 
dents an understanding of markets that 
makes them more conservative has not 
been disproved. 23 

E. Distribution of Benefits 

The economic efficiency of new teach- 
ing methods should be assessed by com- 
paring marginal benefits with marginal 
costs. Marginal benefits depend on the im- 
pact of the method on direct output units 
(e.g., cognitive achievement, student atti- 

43 Cf., Stigler [158, 1959] and Fels [50, 1978]. Some 
authors have concluded that microeconomics had 
a conservative influence but, having doubts about 
their evidence, we do not cite them. Fred A Thomp- 
son found that increased knowledge of international 
economics was associated with a shift to a more favor- 
able attitude toward free trade [163, 1973] 
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tudes) and the value that recipients place 
on the outputs. Hansen, Kelley, and Weis- 
brod observe that different opinions about 
who are "most 'important* to teach may 
be at the root of much of the controversy 
[surrounding instructional techniques]" 
[61, 1970, p. 365]. For example, students 
differ in previous preparation, verbal and 
mathematical abilities, career goals, sex, 
social concern, and many other character- 
istics. Consequently, they are likely to 
benefit differentially from any particular 
instructional method. It is impossible to 
assess the economic efficiency of a particu- 
lar method without valuing the benefits 
received by various individuals. Typically 
we rely on market forces and equilibrium 
competitive prices to assign values to the 
marginal outputs. The peculiar financing 
of higher education, the market failures 
in appropriating benefits, and the inability 
to attribute consequences to causes (earn- 
ings to courses) make ascertaining such 
values difficult. Nevertheless, it is impor- 
tant to consider explicitly the distri- 
butional effects. 

The two methods for incorporating dis- 
tributional effects into typical production 
function studies of instructional methods 
are either to include interaction variables 
in the model — interactions between the 
experimental treatment variable and the 
important characteristics that vary among 
students, e.g., sex, family income, natural 
ability— or to estimate the relationship 
separately for various groups of students. 
These techniques have been used occa- 
sionally in recent years (e.g., Kelley [80, 
1972]; Allison [2, 1976]) and the findings 
reveal the importance of considering sep- 
arately the impact of instructional meth- 
ods on various groups of students. How- 
ever, little progress has been made 
(indeed, maybe none can be) in assessing 
the differential valuation that different 
students place on receiving increments of 
cognitive achievement or changes in atti- 
tudes. 



III. The Impact of 
Human Capital and College Environment 
on Economics Education 

This section summarizes the research 
findings on the effect of student human 
capital, faculty human capital, college en- 
vironment (other capital), and student ef- 
fort on learning economics. There is, to 
our knowledge, only one study of the ef- 
fect of faculty effort on learning econom- 
ics. Allison [2, 1976] found that instruc- 
tor preparation time increased student 
achievement, ceteris paribus. 

The focus of most evaluative studies in 
economics education is on new learning 
technologies. Capital and variable inputs 
are usually included in the regression 
equations in order to control for other fac- 
tors that may be correlated with student 
learning and the presence of the innova- 
tive technique {e.g., students with higher 
aptitudes may be more inclined to elect 
innovative techniques if they are given 
the option, in which case one might erro- 
neously attribute their better perform- 
ance to the innovation when, in fact, it 
was caused by their superior aptitude). 
The number of studies that consider such 
variables is enormous. Therefore we limit 
references to them. 

A. Student Human Capital 

Three types of variables have been used 
to assess the impact of student human cap- 
ital on economics education output. First, 
a variety of general aptitude measures 
have been employed. Most studies find 
college entrance examination scores (SAT, 
ACT) to be positive and significantly asso- 
ciated with economics test performance. 
Verbal SAT's seem to be more important 
than quantitative SATs for the TUCE.* 4 

** Mathematical aptitude seems to be more impor- 
tant in scoring well on locally constructed tests. This 
may reflect the emphasis of the TUCE on applica- 
tions. 
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This finding is supported by the usual non- 
significance of previous mathematics 
courses in such regressions. High school 
class rank, a measure of general aptitude, 
usually has a positive impact on post-test 
examination scores in economics. Mea- 
sures of student maturity, such as age, year 
in school, and number of previous college 
credit hours, usually show ;iO relationship 
to cognitive performance. 35 The few stud- 
ies that have included measures of stu- 
dents' socioeconomic backgrounds have 
found such variables as family income 
and/ or parents' education to be unimpor- 
tant. 

Second, measures of prior knowledge 
of economics have produced mixed re- 
sults. The effect of high school economics 
courses on performance in college eco- 
nomics has been widely investigated, but 
the results are inconclusive. The most re- 
cent study concludes that after adjusting 
for other factors, students who had taken 
previous economics courses did not begin 
their principles course with significantly 
more knowledge, nor did they learn sig- 
nificantly more during the semester [120, 
John Palmer, Geoffrey Carliner, and 
Thomas Romer, 1979]. The most obvious 
predictor of the post-test score in intro- 
ductory economics is the pre-test score. 
It is almost always found to be positive 
and significant. However, when gap-clos- 
ing measures are used, the pre-test score 
has been found to exert a negative influ- 
ence on the post-test score. This suggests 
that students who know more economics 
at the beginning of the course learn rela- 
tively less during the course, but Becker 
and Salemi have shown that this apparent 
paradox may be explained by simulta- 
neous equations bias in most of the studies 
[18, 1977]. 

Third, many studies have included stu- 

** The non-si^ificance of many control variables 
may be due tt Tiulticollinearity, in which case a test 
of their joint significance would be more appropri- 
ate. 



dent major and indexes of student inter- 
est, presumably to test the hypothesis that 
learning is related to motivation. The re- 
sults of these tests are mixed, which may 
reflect the variety of instruments used to 
measure interest. 

B. Faculty Human Capital 

Several studies have found that years 
of teaching experience, instructor TUCE 
scores, and instructor's graduate school 
grades are positively related to learning 
in introductory economics classes. In addi- 
tion, the studies that have attempted to 
explain cognitive achievement with 
course evaluation scores may be directed 
toward the question ' % f whether faculty 
human capital affects student learning. If 
in evaluations students rate highly the 
characteristics of instructors that help 
them learn more efficiently, then we 
would expect students of professors with 
higher evaluations to perform better on 
examinations. Many studies that use multi- 
ple instructors of a single course find that 
the particular instructor does make a dif- 
ference. Apparently differences among in- 
structors are not entirely captured in the 
models. 

C. College Environment 

Studies of class size are almost unani- 
mous in finding no influence on test scores. 
Harry Levin found no difference in cogni- 
tive achievement between students in 
classes of 30 and students in classes of 80 
to 120 [90, 1967]. However, students rated 
the course higher if they were in smaller 
classes. The latter finding has been con- 
firmed by Mirus [106, 1973]. Lewis and 
Dahl, while finding the usi al lack of im- 
pact of class size on TUCE performance, 
discovered a negative effect of class size 
on students' critical thinking skills [92, 
1972]. In general, it appears that students 
are happier and perhaps learn to think 
better in smaller classes, but performance 
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on standardized tests is independent of 
class size. 

Attiyeh, Bach, and Lumsden found that 
students from larger colleges and from 
colleges with higher school wide SAT aver- 
ages did better on the TUCE [6, 1969], 
In addition, they found substantial varia- 
tion in performance on the basis of type 
of school. 

Most research has found that two semes- 
ters u- introductory economics yield 
greater understanding than one. The lat- 
est of these studies [42, Ralph Elliott, M. 
Edwin Ireland, and Teresa Cannon, 1978] 
reports that two-semester students at 
Clemson University do significantly better 
on each of various functional forms of 
TUCE scores. The gains, however, are 
small in comparison to a one hundred per- 
cent increase in inputs (a second semes- 
ter). 

In most studies of the question, the 
choice of textbook did not matter. In Saun- 
ders's study there was no significant differ- 
ence in TUCE scores for students usinf 
any of five leading introductory textbooks 
listed by author's name [132, 1973, p. 60 
and Table 15]. For seniors who had had 
the introductory course two years earlier, 
however, those who had had textbooks 
other than the five named did significantly 
worse, as did those alumni whose textbook 
was unknown. Attiyeh, Bach, and Lums- 
den, on the other hand, found that stu- 
dents using either of two leading conven- 
tional textbooks did significantly better 
than those using any of ten books lumped 
together in an "all other" category, 
whereas those with a third leading text 
did not [6, ;969, pp. 220-21]. As pointed 
out in Subsection IV.D below, they also 
found a difference between two pro- 
grammed texts. 

D. Student Effort 

It is not clear whether stucent effort 
should be an independent (control) or de- 



pendent variable. Indeed, one of the im- 
portant insights of the application of mi- 
croeconomic theory to the learning pro- 
cess is that students may well choose to 
take efficiency gains in the form of re- 
duced time inputs into their economics 
course [104, McKenzie and Staaf, 1974]. 

A number of studies have attempted to 
correlate student effort with test perfor- 
mance, the best of which are by Becker 
and Salemi [18, 1977] and Allison [3, 
1977]. With few exceptions, these studies 
have fouiid no impact of study time on 
performance. In a simultaneous equations 
model, Allison finds low positive elastici- 
ties of achievement with respect to stu- 
dent effort [3, 1977]. However, her results 
indicate that students "learn to learn"; the 
elasticity, while remaining low relative to 
that of student ability, quadrupled from 
the first to the second semester couise. 
Other measures of student effort confirm 
this conclusion. Attendance and student 
course load (which measures competing 
demands on student time) do not appear 
related to performance on standardized 
tests. 26 

Grade point averages are sometimes in- 
terpreted as measures of student effort, 
especially when included simultaneously 
with aptitude test scores. Grade point 
average (GPA) is usually found to be posi- 
tively related to test performance, which 
might indicate that GPA is a better mea- 
sure of aptitude than student effort. 

In sum, it appears that a student's gen- 
eral (especially verbal) aptitude is the most 
important determinant of learning. So- 
cioeconomic background, prior econom- 
ics courses, mathematics preparation, 
class size, textbooks, and study effort do 
not seem to matter very much. The evi- 
dence on the effect of student interest on 
test performance is mixed. 

a * Class attendance does seem to be important for 
acUevement on locally constructed tests, but not 
standardized ones [119, Donald W. Paden and M. 
Eugene Moyer, 1969], 
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IV. The Impact of 
Alternative Teaching Methods 
on Economics Education 

A. Games and Computer-Assisted 
Instruction 

The computer has been employed in 
economics instruction in two gene^J 
ways: (1) games, simulation models, jud 
demonstration routines (CAI); and (2) 
study management systems (CMI). CAI in- 
cludes the popular macroeconomic mod- 
els of the economy, in which students at- 
tempt to set policy parameters in order 
to achieve a menu of specified macroeco- 
nomic goals that involve trade-offs. CMI 
consists mainly of review routines (short 
quizzes) with instant feedback to students 
and individualized instructions to students 
wanting to know the most efficient study 
strategy to pursue. It is discussed in the 
next section. 

Because of their sequential nature, com- 
putational requirements, and record- 
keeping needs, many games and simula- 
tions utilize computer facilities, especially 
time-sharing. Games, however, can be 
played without the use of computers. Most 
games involve role playing — a stock- 
market trader, a manager of a firm unc? ^r 
various degrees of competition, an urba i 
planner, or an advisor to the finance minis- 
ter — where the student is required to map 
out strategies over time. Thus games tend 
to emphasize the interactions in an eco- 
nomic situation and are more likely to 
teach analytical methods successfully than 
recognition and understanding of eco- 
nomic terms and concepts. 

In a 1974 review of the computer-as- 
sisted instruction (CAI) literature, Soper 
listed 45 published and unpublished re- 
ports of the use of CAI in teachir .g eco- 
nomics [152, 1974]. Most of the reports 
described the use of games and simula- 
tions in teaching principles of economics 
or intermediate macroeconomics. Since 



almost all of the games that have been 
evaluated use the computer, we hence- 
forth use CAI to mean games and com- 
puter-assisted instruction. 

There are several reasons to expect CAI 
to influence learning. Instant feedback (at 
least with time-sharing), novelty, and con- 
venience all should serve to alter students' 
learning functions. Quick reinforcement 
of correct responses or immediate expla- 
nation of incorrect responses has long 
been one of the most widely accepted 
principles for improving learning. The 
novelty of computer-assisted instruction 
may improve student attitudes and, in 
turn, learning. The convenience of time- 
sharing may influence learning by permit- 
ting students to use their time more effi- 
ciently. 

Simulation models, which apply the 
principles gleaned from comparative stat- 
ics to a dynamic world, may give students 
an appreciation for the difficult problems 
confronting policy-makers. Computer oli* 
gopoly games instill respect for the dy- 
namic strategies that complicate the real 
industrial world and require complicated 
models to characterize reality. Games are 
(usually) less abstract than lectures and 
may stimulate learning because they are 
realistic. Finally, most students seem to 
like games, and student attitudes may af- 
fect cognitive performance [23, Samuel 
Bowles, 1970]. 

The experiments testing the effective- 
ness of games and/or computer-assisted 
instruction assess the impact of CAI on 
(1) cognitive evaluation instruments, such 
as the TUCE; (2) student attitudes; (3) the 
lasting effects of greater cognitive 
achievement of students who experienced 
CAI; and (4) the distribution of benefits 
between high- and low-achieving stu- 
dents. Overall, the conclusions about the 
effectiveness of CAI in improving under- 
standing of economics are pessimistic. 

Wentworth and Lewis carefully evalu- 
ated the noncomputer learning game 



?erjk: 



120] 

29 



Siegfried and Fels: Teaching College Economics 



called Marketplace at two Minnesota ju- 
nior colleges during 1971-72 using multi- 
ple linear regression analysis and a sample 
of 149 students divided between two in- 
structors and users and nonusers of the 
game (four cells) [176, 1975]. They found 
that playing the game as a substitute for 
eight class periods of conventional lecture 
instruction has a statistically significant 
impact on students' gain in TUCE scores 
during the semester. The effect, however, 
was negative. Students attending the lec- 
tures gained 1.43 points more on the 
TUCE during the semester after control- 
ling for intelligence (ACT), age, college, 
high school economics background, sex, 
and student interest in economics. 

The most complete study of CAI is re- 
ported by Davisson and Bonello [40, 
1976]. They describe the development of 
computer-assisted instruction in princi- 
ples of economics at Notre Dame during 
the 1973-75 period and report the results 
of an evaluation of its cost effectiveness. 
Review routines comprise the major part 
of the CAI program at Notre Dame and 
consist of multiple choice questions with 
prompts and verification statements. 
There are also demonstration and game 
simulation routines. The impact of CAI 
on cognitive performance in the Notre 
Dame experiment was evaluated using 
the TUCE. The findings were similar to 
many of the other experiments with 
CAI — no significant difference between 
the experimental and control groups. 

John Chizmar etal [31, 1977] adapted 
Alan Blinder 's [19, 1973] methodology for 
separating shifts in the production func- 
tion from shifts in the coefficients of vari- 
ables included in the model and found 
that Illinois State students using the Notre 
Dame CAI package performed slightly 
better on the TUCE; but they did this in 
spite of CAI, primarily because of greater 
ability. 

In a regression analysis of the effective- 
ness of CAI games in a macroeconomics 



course at Arizona State University, Steven 
Cox found no effect of CAI games on stu- 
dent performance on multiple choice tests 
[36, 1974]. He experimented with nonlin- 
ear relationships between the indepen- 
dent and dependent variables, but found 
that a simple linear model produced the 
same conclusions as the more complex 
specifications. 

In an evaluation of the effectiveness 
of one micro and two macro games at 
St. Olaf College, E. David Emery and 
Thomas Enger assessed student perfor- 
mance on various types of multiple choice 
questions [43, 1972]. They found that CAI 
led to higher achievement on questions 
requiring analysis and policy decisions. 27 
However, in a follow-up to the St. Olaf 
CAI experiment, Emery and Jean 
Schoene found that the initial advantage 
of CAI students had completely disap- 
peared over 16 months [44, c. 1974]. 

Most of the reports of CAI activities in- 
dicate that students enjoy playing com- 
puter games. However, Davisson and Bo- 
nello's extensive analysis of the impact of 
CAI on student attitudes revealed no dif- 
ferences between the CAI users and con- 
trol groups [40, 1976]. 

On the question of the distribution of 
benefits from CAI, Cox found that all stu- 
dents except ihose who earned C grades 
benefited [36, 1974]. Emery and Enger 
found no difference in benefits on the basis 
of student aptitude [43, 1972]. Bonello, 
Davisson, and Swartz report that lower 
achievement students in the Notre Dame 
experiment benefited from CAI, while 
better students did not [20, 1978]. This 
occurred in the presence of a nonsignifi- 
cant overall effect and illustrates why it 
is important to identify precisely the tar- 

27 Because students in the experimental group did 
not do significantly better on ali types of questions, 
they concluded that their better performance on 
some questions was not due to an incentive (Haw- 
thorne) effect. There is no reason to expect incentive 
effects to influence only one type of learning. 



1211 



Journal of Economic Literature, Vol XVII (September 1979) 



get group and assess the impact of educa- 
tional experiments on specific subgroups 
of students. 

Davisson and Bonello attempted to as- 
sess the costs of implementing CAI [40, 
1976]. They estimated that start-up costs 
were $20,000 in 1973-74, but their esti- 
mate is based on a very low valuation of 
faculty time ($1,500 per month, implying 
a $13,500 academic-year salary, which in 
1973-74 was about the starting salary of 
assistant professors). The faculty who have 
the skills to develop CAI are also likely 
to be the faculty who have skills sufficient 
to insure a high opportunity cost of time 
during summers. 

Operating costs can be assessed on ei- 
ther a budgetary or economic basis. Bud- 
getarily, CAI will probably cost much 
more than conventional instruction be- 
cause universities tend to assess user 
charges for computer use but not for the 
alternatives such as library use. However, 
in real economic costs, the difference may 
be much less. An input that can be dis- 
pensed with if CAI is adopted is graduate 
teaching assistants. Davisson and Bonello 
found that CAI was no more expensive 
than graduate teaching assistants meeting 
discussion sections once a week. On the 
other hand, if the opportunity cost of grad- 
uate TA s is very low to a department (that 
is, they are there and supported finan- 
cially, not to provide instruction to under- 
graduates but because the faculty wants 
a graduate program for some other rea- 
son), then CAI will be more expensive. 

Overall, games and CAI in economics 
do not appear to be the route to nirvana 
they were once expected to be. CAI ap- 
pears to generate no more (or no less) cog- 
nitive achievement, 38 but probably costs 

29 One possible explanation for the negative find- 
ings can be found in a warning from one of the 
pioneers in developing computer games, Myron Jo- 
seph, who cautioned that "there is some danger that 
the complexity permitted by the computer will ob- 
scure the learning objectives of simulation" [73, 
1970, p. 95]. 



more than conventional pedagogical 
methods. In addition to instructor and 
computer-facility costs, most games and 
computer-simulation exercises are stu- 
dent-time intensive. They are unlikely to 
be generating benefits in the form of re- 
leased student time. 29 

B. Computerized Study Management 

There have been at least three laige- 
scale efforts to utilize computer facilities 
to individualize and improve student 
study management. Kelley [78, 1968; 80, 
1972; 82, 1973] reports on the effective- 
ness of the Teaching Information Process- 
ing System (TIPS) at the University of Wis- 
consin, Bernard Booms and D. Lynne 
Kaltreider [21, 1974] on Computer Gen- 
erated Repeatable Testing (CGRT) at 
Pennsylvania State, and Donald Paden, 
Bruce Dalgaard, and Michael Barr [118, 
1977] on the computer study manage- 
ment system (PLATO) at the University 
of Illinois. These three experiments have 
been conducted at large state institutions, 
where large classes and impersonal atmos- 
pheres are more likely to create need for 
individualization of instruction. 

Computerized study management sys- 
tems usually administer periodic short 
quizzes and then provide rapid feedback 
to students. Kelley 's system evaluates stu- 
dents' learning problems (from "surveys" 
that do not count toward students' grades) 
and provides a personalized assignment 
suited to the level of understanding of 
each student. In addition, such programs 
usually provide record-keeping and test 
analysis and grading services. 

Study management systems are ex- 
pected to improve student learning by 
"directing student activity." This implies 
that someone other than the student 

a * Short, non-computerized classroom games, 
however, may be very inexpensive. Unfortunately 
there is an absence of solid controlled assessments 
of such games, so little can be said about their effi- 
ciency. 



[22] 

31 



Siegfried and Fels: Teaching College Economics 



knows best how the student learns most 
efficiently. Study management systems 
could influence learning by improving stu- 
dent attitudes or by providing individual- 
ized help. In addition, Kelley uses the 
TIPS system to inform the lecturer and 
teaching assistants of student progress. 
The study management systems also en- 
courage students to keep up, thereby 
avoiding the deleterious effects of cram- 
ming. 

Kelley performed controlled experi- 
ments of the effectiveness of MPS and 
found it had a significant positive effect 
on achievement [79, 1970; 80, 1972]. The 
effectiveness of TIPS is implemented by 
providing additional, less difficult assign- 
ments to Jow-achieving students, and 
fewer, more difficult assignments to high- 
achieving students, thereby directing 
teaching and student resources to the 
place where their marginal product is 
higher. Kelley reports that TIPS is not 
more expensive than conventional in- 
struction because it may cut back on as- 
signments to bright students while in- 
creasing assignments to low achievers [80, 
1972, p. 426]. A follow-up test a year later 
indicated that the TIPS cognitive differen- 
tial was maintained [80, p. 426]. Most of 
the benefits of TIPS appear to accrue to 
low achievers [80, p. 425]. 

The Pennsylvania State system (CGRT) 
is similar in many respects to TIPS, but 
the computer tests count in students' 
grades, while Kelley uses them only for 
study practice. Booms and Kaltreider 
found higher mean TUCE scores for 
CGRT students than for a control group 
[21, 1974]. Student opinions of CGRT 
were favorable, but the system cost almost 
50 percent more than conventional in- 
struction. 

PLATO consists cf instructional se- 
quences and elaborate testing and record- 
keeping capabilities. The Paden et al. 
evaluation of PLATO found that mean 
scores on an unspecified cognitive test 



were higher, but they did not attempt to 
hold human capital and utilization rates 
constant [118, 1977]. This deficiency is 
dangerous in view of the findings of Chiz- 
mar et al that students who elect com- 
puter-assisted instruction may have char- 
acteristics that permit them to do well in 
spite of CAI [31, 1977]. 

In sum, the computer-aided study man- 
agement systems seem to perform much 
better than games and simulation rou- 
tines. Although there were no clear-cut 
cost advantages to computerized study 
management, it does not appear to be sig- 
nificantly more expensive. 

C. Programmed Instruction 

Programmed instruction, brainchild of 
the behavioral psychologist B. F. Skinner 
(see Skinner an J J. G. Holland [148, 1961]) 
has been the subject of one major and a 
number of minor studies in economics ed- 
ucation. The major study, by Attiyeh, 
Bach, and Lumsden (A-B-L) [6, 1969; 7, 
1970], found a large gain in efficiency from 
using programmed instruction. The minor 
studies have not seriously challenged this 
result. 30 Nevertheless, programmed in- 
struction is not widely used in econom- 
ics. 

Programmed instruction is based on 
certain psychological principles of learn- 
ing, particularly positive reinforcement, 

30 In the chief of the minor studies, Attiyeh and 
Lumsden showed that p.ogrammed instruction writ- 
ten for high school use was superior to the alternative 
in a Palo Alto school [8, 1965]. Bach found that a 
superior teacher could outperform the better of the 
two programmed texts used in the study by Attiyeh, 
Bach, and Lumsden [6, 1969]. Soper [150, 1973] and 
Donald Darnton [39, 1971] have developed methods 
of overcoming student antipathy to programmed in- 
struction. Other studies include Stephen Buckles and 
Marshall E. McMahon [27, 1971], Paden and Mover 
[119, 1969], Fels and Dennis R. Starleaf [51, 1963], 
Fusfeld and Gregory Jump [54, 1966], Thomas Hav- 
nlesky [66, 1971], Dennis Weidenaar [169, 1972], 
Attiyeh and Lumsden [10, 1972], Lumsden [95, 
1967], Phillip W. Tiemann, Paden, and Charles J. 
Mclntyre [164. 1966], and William C. O'Connor 
[117, 1974]. 
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active involvement of the student, prompt 
feedback, encouragement through psy- 
chological rewards, and instruction of 
complex ideas by breaking them down 
into small increments. The material is 
presented in a succession of "frames," 
each usually consisting of one or two sen- 
tences with blanks for the student to fill 
in. The students find out immediately 
whether their answers are correct by look- 
ing at the bottom of the page or on the 
next page. They almost always are correct. 
They therefore are supposed to get a feel- 
ing of accomplishment that will encour- 
age them to go on, and on the infrequent 
occasions that they are wrong, their errors 
are corrected immediately. Programmed 
instruct) jn contains a great deal of irregu- 
lar repetition to reinforce learning of im- 
portant concepts. In contrast to the usual 
encyclopedic textbook, the writing of pro- 
grammed instruction requires the authors 
to decide exactly what they want to teach 
and to omit everything else, a characteris- 
tic designed to make learning more effi- 
cient. 

A-B-L conducted a nationwide experi- 
ment on programmed instruction involv- 
ing 48 schools and 4,121 students. Each 
participating school established three test 
groups. Group I used one of two program- 
med texts exclusively for three weeks, 
studying a total of 12 hours per student 
on the average. At the end of this period, 
the students were tested (and from then 
on received conventional instruction). 
Group II students received conventional 
instruction supplemented with the pro- 
grammed learning text (on which they 
spent an average of eight hours). Students 
in Group III had only the conventionally 
taught course. Groups II and III were 
tested after they had had an average of 
seven weeks of their respective courses. 

A-B-L used regression analysis with the 
number of correct answers on a prelimi- 
nary version of the TUCE as the depen- 
dent variable. All the variables rep- 



resenting student characteristics {e.g., 
educational level, sex, Si47score) were sta- 
tistically significant at the 95 percent con- 
fidence level and quantitatively impor- 
tant. SAT score was the most important 
single determinant. In terms of school 
characteristics, average freshman en- 
trance examination score, school size (with 
a positive coefficient), and school type 
were statistically significant. The coeffi- 
cient for state colleges was positive and 
statistically significant, the coefficients for 
liberal arts colleges and for large state uni- 
versities were not significantly different 
from zero, and the coefficient for "pres- 
tige" schools was significant and nega- 
tive. 

The major results were: (1) on the aver- 
age, spending 12 hours studying a pro- 
grammed text over a three-week period 
was approximately equivalent to spending 
seven weeks in a conventionally taught 
course; (2) students using programmed 
learning performed relatively better on 
applications than on recognition-and-un- 
derstanding questions; and (3) students 
had a positive attitude toward program- 
med learning, generally considering it 
more effective than interesting. 

The A-B-L study is notable for being 
one of the few in economics education 
based on a large sample of schools and 
student It is also notable for having been 
largely ignored by a profession that nor- 
mally takes a great interest in questions 
of efficiency. Despite the finding that pro- 
grammed instruction can accomplish al- 
most as much in three weeks as conven- 
tional instruction in seven, there was no 
rush by economics professors to adopt it. 

D. Personalized System of Instruction 

The personalized system of instruction 
(PSI) pioneered by the psychologist Fred 
S. Keller [77, 1974] is quite different from 
Skinner's programmed instruction but is 
based on similar principles. The enthusi- 
asm it has generated in fields other than 
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economics, though reminiscent of the 
early days of teaching machines, appears 
to go deeper and have more staying 
power. 

Like programmed learning, PSI re- 
quires instructors to decide exactly what 
they want students to learn. Students are 
given short assignments accompanied by 
specific statements of behavioral objec- 
tives. 31 Students study the assignments at 
their own pace. When they feel ready on 
the first assignment, they take a test which 
is immediately graded for mastery (usually 
defined as 90 percent). A student who 
passes goes on to the next assignment. Stu- 
dents who do not pass are recycled — Le. 9 
they study the assignment some more and 
take another test, repeating the proce- 
dure as often as necessary to pass. The 
grade in the course depends on the num- 
ber of assignments completed. Proctors 
(usually undergraduates who have already 
had the course) are available to help the 
students and to administer and grade the 
tests. There are few, if f*ny, lectures. As 
with programmed instruction, feedback is 
immediate, and students are expected to 
get a sense of satisfaction from achieving 
mastery (and be motivated by it to go on). 
PSI has been used in many fields and 
schools. In the vast literature on PSI, con- 
trolled experiments typically show that 
students like PSI and think they learn 
more from it than from conventional in- 
struction. Objective data show that they 
have in fact learned at least as much, 
sometimes more. They work harder but 
feel the extra effort is worth it. 

Fels's PSI course [49, 1974] (see also Ev- 
ans Jenkins [69, 1977]), in contrast to the 
one at Harvard to be described next, fol- 
lowed the Keller model closely. Although 

31 That is, the students are told what they are ex- 
pected to be able to do in operationally meaningful 
terms, such as: given the input requirements in man- 
hours for two count ies and two commodities, deter- 
mine the comparative advantage of each "Under- 
stand the model of comparative advantage" won't 
do; it is too vague. 



the results of an elaborate evaluation were 
similar to those of PSI courses generally, 
there was no objective confirmation of the 
students' impression that they had learned 
more; contrary to the usual experience, 
the students worked no harder than in a 
conventional course; and the chief educa- 
tional gain went to the proctors, who ap- 
parently learned a good deal more than 
they would have from taking intermediate 
economics courses. (See Siegfried and Ste- 
phen H. Strand [143, 1976] and Siegfried 
[141, 1977].) 

Self-pacing is a fundamental part of the 
PSI method. Quick students are not held 
back by slow ones and may finish the 
course early. Slow students are not com- 
pelled to stay in lockstep with the class. 
All students can work at times most con- 
venient for their other activities. 

Self-pacing can be adopted without go- 
ing the whole PSI route. In the fall of 1972, 
Harvard began an experiment in which 
self-paced instruction was used over a 
three-year period in three sections each 
with 25 to 30 randomly-selected students 
(Allison [2, 1976; 3, 1977]). The distin- 
guishing features of the program were: (1) 
the course wau> divided into only eight 
units per semester for separate testing 
(usually PSI courses are broken into 20 
or 30 units); (2) the sections had regular 
class meetings with voluntary attendance; 
and (3) upper-classmen and graduate stu- 
dents were used as graders. 

For evaluation it was assumed that stu- 
dents allocate effort among courses and 
other activities to maximize some utility 
function. A three-equation model was 
constructed with an effort equation, an 
achievement equation, and an enjoyment 
equation; later a fourth equation was 
added for major. These simultaneous 
equations were designed for estimation of 
a production function. 

The self-paced effect on student enjoy- 
ment was found through repeated ques- 
tionnaires to be largely attributable to a 
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Hawthorne effect. 3 * Self-pacing increased 
performance on tests by 15 percent with 
the differential increasing two years after 
the course. The advantage was greater for 
freshmen and for students with &47scores 
below 700 than for more advanced and 
better students, but the differences were 
small and variable. Controlling for effort 
showed that self-pacing actually increased 
students* learning instead of merely in- 
ducing them to work harder. The results 
are consistent with the hypothesis that the 
crucial feature of self-pacing is the interac- 
tion be.*veen grader and student, a di- 
rected form of one-to-one instruction. 33 
The positive results from self-pacing ob- 
tained at Harvard, together with the large 
number of successful experiments in other 
fields, suggest that self-pacing has a signifi- 
cant contribution to make to economics 
education. 

E. Video 

A televised course on 'The American 
Economy'* with John R. Coleman as the 
principal teacher ran on 182 CBS stations, 
54 educational stations, and 5 indepen- 
dent stations during 1962-63. Of the 160 
half-hour lessons, 128 were on economic 
content, the other 32 on teaching meth- 
ods. The total audience was over a million, 
with 5,000 taking the course for credit at 
361 colleges and universities. The results 
were evaluated in three separate research 
projects. 

In a study reported by Saunders, the 
TEU was given to 71 television students, 
113 Carnegie-Tech students taking a one- 
year course in economics, and a control 
group of 73 school teachers "substantially 

n I.e., students like to take part in experiments. 
This advantage of self-pacing would be lost once the 
method became routine. 

M In another report Allison surveyed self-paced 
introductory courses at seven schools [1, 1975]. She 
concluded that the reports on them were more use- 
fid as descriptions of the forms and functions of self- 
paced courses than as evidence of their educational 
benefit. 



identical to the TV students, but who had 
not watched The American Economy* 
. . [127, 1964, p. 398]. The TV group 
scored approximately the same (40.9 ques- 
tions right out of 50 with a standard devia- 
tion of 4.8) as the Carnegie-Tech sopho- 
mores (40.8) and significantly higher than 
the control group (33.9). A multiple re- 
gression analysis identified one other vari- 
able besides taking the TV course for 
credit that significantly affected the out- 
come, namely, a previous course in 
economics. 34 

Campbell R. McConnell and John R. 
Felton reported a controlled experiment 
consisting of 27 students who took the 
CBS-TV course for credit matched with 
27 students from a large live lecture 
course at the University of Nebraska who 
were similar in grade average, number of 
college hours completed, and course of 
study [100, 1964]. On the TEU, there was 
no significant difference between the two 
groups, but on 120 multiple-choice ques- 
tions prepared by the authors, the live 
groups did significantly better on the 60 
conceptual and 40 problem-analytical 
questions. Consistent with the TEU find- 
ings, there was no difference between the 
groups on the 20 factual questions. Since 
the TV course did not aim to go much 
beyond what the Committee for Eco- 
nomic Development's Task Force Report 
[34, 1961] specified high school students 
should know (omitting such concepts as 
marginal cost and marginal revenue), the 
TV course was evidently successful in its 
aims even though less successful in achiev- 
ing the aims of the introductory college 
course. This study is a useful complement 
to the one by Saunders. The authors con- 

34 The value of the study is limited not only be- 
cause it was confined to students and teachers in 
the Pittsburgh area but also because it used the TEU, 
a test designed for h gh school students. Neverthe- 
less, the substantial correlation found between TEU 
scores and grades based on essay examinations (.70 
for Carnegie-Tech students, .82 for the TV group) 
is reassuring. 
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firmed Saunders's findings that the TV stu- 
dents scored just as high on the TEU as 
students in a conventional course, but on 
questions designed for college rather than 
high school students, those in the regular 
course did better. 

The most ambitious study of the TV 
course was a survey by the National Opin- 
ion Research Center reported by Bach 
and Saunders [13, 1965]. About 20 percent 
of the 65,000 high school social studies 
teachers watched the TV course at one 
time or another, but only 5 percent 
watched once a week or more. Over 5,000 
students took the coprse for credit with 
4,400 completing it successfully, 1,800 of 
whom were school teachers. A 25-item 
version of the TEU was given to 3,966 
teachers. In a multiple regression analysis 
(in which R 2 was only .15), watching the 
TV course was by far the most important 
variable, its coefficient being twice as 
large as that for taking five or more college 
courses in economics. One or two previous 
courses made no significant difference, 
presumably because of forgetting. Taking 
the TV course for credit, as distinct from 
only watching it, had a negative coeffi- 
cient, apparently because the first group 
was interested in credit, the second in 
learning. 

Although the TV films were subse- 
quently made available for educational 
use, "The American Economy" was essen- 
tially a one-shot affair, enormously expen- 
sive, apparently well worth the dollars 
spent, but not to be repeated for a long 
time. There is nothing in the research re- 
sults on it to indicate that TV lectures are 
superior to live lectures, but they do sug- 
gest that TV can be just as good. 

Although closed-circuit TV is now 
widely used to teach elementary econom- 
ics in schools with large enrollments, pub- 
lished evaluations are scarce. An experi- 
ment in 1964-65 reported by McConnell 
used a matched-pair technique to com- 
pare teaching by McConnell on television; 



McConnell in a large, live, lecture class; 
McConnell in a small, live, lecture class; 
and graduate assistants at the University 
of Nebraska [99, 1968]. In terms of teach- 
ing as measured by 170 multiple-choice 
questions, the differences were not statis- 
tically significant. In terms of student atti- 
tude toward teaching method, TV was sig- 
nificantly worse than each of the other 
options. McConnell stated that TV was 
cheaper but gave no data. McConnell and 
Charles Lamphear reported a follow-up 
experiment at Nebraska in which 440 
principles students were given the option 
of a televised lecture course or a course 
with no lectures at all [101, 1969]. An Om- 
nibus Personality Inventory test indicated 
no significant difference between the 354 
choosing to view the lectures and the 86 
choosing the alternative. The T v lectures 
were textbook oriented. For evaluation, 
a battery of multiple-choice questions was 
used, including 170 internally generated 
items, the TEU, and items provided by 
the committee developing the TUCE 
There were no statistically significant dif- 
ferences at the 5 percent level. Attitude 
surveys revealed a significant preference 
for the lectureless method. A subsequent 
report by Lamphear and McConnell indi- 
cated that the lectureless group and the 
TV group did better than all students 
taught by graduate teaching assistants the 
previous year [87, 1970]. 

If true, the Nebraska findings would be 
very interesting, but it is not clear how 
much credence can be put in them. Taken 
at face value, they suggest that textbook- 
oriented lectures, whether live or on TV, 
have a value-added close to zero. But they 
are base d on on e instructor at one 
university. 35 They are, however, consis- 
tent with the findings of Attiyeh, Bach, 

35 There are other questions that might be raised 
about the Nebraska studies. The groups were not 
selected at random; the Hawthorne effect could ac- 
count for some of the results; and the statistical rneth* 
ods are not highly sophisticated. 
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and Lumsden on programmed instruction 
[6, 1969]. They are also consistent with 
the hypothesis that different students 
learn in different ways. 

Paden and Moyer found no differences 
at the University of Illinois in amount 
learned between groups taught by live 
lectures, TV, and programmed instruc- 
tion, but the attitude toward live instruc- 
tion was more favorable than toward the 
other two [119, 1969]. 

To conclude, the hypotheses that stu- 
dents learn as much from TV lectures but 
prefer live lectures are plausible, but so- 
phisticated testing of them in economics 
has been limited. 36 

F. Specification of Instructional 
Objectives 

The premise that course objectives 
should be stated in terms of specific stu- 
dent behavior has intuitive plausibility. If 
instructors decide in advance exactly what 
they are trying to accomplish, they may 
improve their chances of achieving it, and 
if the students are informed of the goals, 
the efficiency of their efforts may be 
increased. 37 Ideally the instructor speci- 
fies observable behavior on the part of the 
student that represents not only the goal 
but also the means for telling whether the 
goal has been achieved. In the words of 
Saunders: "To be complete, an instruc- 
tional objective should also contain a state- 
ment of the conditions in which the 
student should be able to do it, and a 
statement of the criteria that will be used 
to judge how well it is done" [134, 1978, 

3 * Effective use c°n be made of video in teacher- 
training programs, which are discussed in a later sec- 
tion of this survey. 

37 A 'reader of the first draft of this article argued 
that stating objectives deprives students of training 
in distinguishing important from unimportant mate- 
rial. It may be replied: if that is the objective, the 
students should be so informed (otherwise distin- 
guishing the important from the unimportant degen- 
erates into "psyching the professor out") and the 
course should be designed to achieve it. 



p. 68, Saunders's emphasis]. For example, 
"Given a price index with one base year 
indicated, the student will correctly con- 
vert the index numbers to those of an- 
other base year when the new base year 
is specified" [134, p. 72]. Saunders then 
gives three multiple choice questions illus- 
trating how to measure accomplishment 
of this objective. 

The goals of a liberal arts education can- 
not, of course, be reduced to a series of 
specific behavioral objectives. There is 
danger that concentration on those tasks 
that lend themselves to such specification 
may divert attention from other impor- 
tant purposes. (See Yates [178, 1978]. Cfi, 
Section LB. above.) Whether specifying 
behavioral objectives is useful is a testable 
hypothesis. The evidence from three stud- 
ies is inconclusive. Dennis L. Nelson [114, 
1970] found positive value from specifi- 
cally stated behavioral objectives; James 
Phillips's study [121, 1972] was inconclu- 
sive; and Cheryl A. Casper [30, 1977] had 
negative results. 

G. Graduate Student Instructors 

Graduate students are used as teachers 
and teachers' aides in most major universi* 
ties in the United States. GSI's may be 
less effective instructors because of their 
inexperience in classroom teaching, com- 
peting demands for their time that take 
priority over their teaching responsibili- 
ties (i.e., their own study for the Ph.D.), 
ignorance of effective teaching techniques 
(which may be related to their inexperi- 
ence), and weaker understanding of the 
substantive material covered hi the class- 
room. On the other hand, GSi's may com- 
pensate for their lack of experience by 
their enthusiasm, their efforts to identify 
what the students do not understand, 
their approachability vis-d-vis regular fac- 
ulty, and the greater rapport they fre- 
quently develop with their class [166, 

H. Tuckman, 1975]. 

There have been published evaluations 
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of the relative effectiveness of GSFs in 
economics at six institutions — Princeton, 
Hebrew University, Carnegie-Mellon, In- 
diana, Florida State, and Nebraska. The 
six studies all employ the same basic meth- 
odology; each compares the performance 
of students taught by GSIs with students 
taught by regular faculty, controlling for 
other differences that may affect relative 
student performance. 

Wallace Oates and Richard Quandt 
compared the performance of students in 
classes taught by advanced graduate stu- 
dents with that of students in classes 
taught by regular Princeton faculty for 
eight semesters, 1965-69 [116, 1970]. The 
GSFs at Princeton had all completed their 
comprehensive examinations for the 
Ph.D. and attended a weekly meeting of 
all instructors in the course, at which time 
the subject matter for the coming week, 
various techniques for presenting this ma- 
terial, and any particular problems that 
had been encountered were discussed. 
The sample consisted of 2,336 students, 
about two-thirds of whom were taught by 
GSIs. Since the basis for comparison was 
performance on a common final examina- 
tion that varied from semester to semes- 
ter, comparisons were made separately for 
each semester. Oates and Quandt found 
that students of GSIs did better in one 
semester, students of regular faculty did 
better in two semesters, and there was 
no statistically meaningful difference in 
the other five semesters. Regression analy- 
sis to explain relative student perfor- 
mance each semester, controlling for 
grade average in other courses and SAT 
scores, confirmed the conclusion of no sys- 
tematic difference in the performance of 
students of GSIs and students of regular 
faculty. 

Among the GSIs there was enough vari- 
ation in previous teaching experience to 
determine that those students of GSFs 
who had more experience did better than 
rookie istructors. Since this finding — that 



experience counts — is confirmed in other 
studies, it appears that there are balancing 
effects at work: (1) the lack of experience 
of GSFs hinders the performance of their 
students, while (2) the enthusiasm, ap- 
proachability, interest, etc, of GSFs helps 
their students, so that the net difference 
between GSIs and regular faculty is negli- 
gible. The Princeton experience may not 
be generalizable to the wide variety of 
GSI teaching experiences in the United 
States due to the rather select group of 
GSIs (very advanced students) who teach 
at Princeton. 

Morawetz compared the performance 
of 1,930 undergraduate students in 66 dif- 
ferent classes at the Hebrew University 
from 1967-75 (excluding the war year of 
1973-74) [108, 1977]. Thirty-five of the 
classes were taught by 12 different faculty 
members and 31 were taught by 16 differ- 
ent GSFs. These GSFs had all completed 
at least two years of graduate work in eco- 
nomics but received little guidance in 
teaching. In six of the seven years there 
was no statistically meaningful difference 
in grades on a common final examination 
at even the 10 percent significance level 
between students taught by faculty and 
students taught by GSFs. 38 In the seventh 
year the students taught by faculty scored 
higher, but a regression analysis to control 
for student aptitude demonstrated that 
this effect was due to the superior students 
in regular faculty classes that year. In a 
comparison within the GSI group, Mor- 
awetz found little difference in student 
performance based on the level of experi- 
ence of the GSI's [108, 1977]. 

Saunders compared the performance of 
2,136 students of regular faculty and GSFs 
on the TEU and the TUCE at Carnegie- 
Mellon University (CMU) during 1964-69 
[130, 1971]. Sixty-three classes were 

" A surprising finding was that the performance 
of students in classes taught by faculty had a higher 
variance than the performance of students in classes 
taught by GSI's. 
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staffed by faculty and 28 were staffed by 
graduate students. At CMU the instructors 
were given some advice on teaching> but 
substantially less than Princeton was pro- 
viding. Difference-between-means tests, 
supported by multiple regressions control- 
ling for student attributes, indicated that 
the students of faculty and the students 
of GSFs performed compaiabiy. 

The CMU results were confirmed, with 
minor exceptions, for thirty GSFs teaching 
about half of the 8,895 students in two 
introductory economics courses at Indiana 
University over eight terms, 1971-75 
[133, Saunders, 1975]. Holding other 
things constant, students of the GSFs per- 
formed as well as students of regular fac- 
ulty in the first semester course and did 
significantly better in the second semester 
course. Since most of the GSFs started 
teaching in the first semester course and 
then moved to the second semester 
course, the Indiana results are consistent 
with the hypothesis that teaching ability 
increases with experience. 

Howard Tuckman compared the teach- 
ing of five faculty and three GSFs in 
twelve macroeconomics courses at Florida 
State during 1972-74 [166, 1975]. There 
were 548 students in the study. Tuckman 
used multiple measures of performance 
because GSFs and faculty may differ in 
teaching skills. Using regression analysis, 
he found that years of teaching experience 
was statistically significant in explaining 
differences in both cognitive achievement 
and student attitudes (in the expected di- 
rections), but the effects were small. In- 
serting a binary variable into the regres- 
sions for faculty status, Tuckman found no 
difference between students of faculty 
and GSFs in either cognitive or attitude 
dimensions. However, since faculty status 
and years of teaching are highly corre- 
lated, the difference between GSFs and 
faculty can be assessed only by observing 
the combined effect of years of experience 
and faculty status. When experience was 



removed from the regression model (its 
effect presumably then captured by the 
faculty status variable), the results indi- 
cated that students of GSFs learn less but 
have more favorable attitudes. This differ- 
ence seems to be a function of GSIs' lack 
of experience rather than their status as 
graduate students. 

McConnell [99, 1968] and Lamphear 
and McConnell [87, 1970] compared the 
performance of students of GSFs with stu- 
dents taught in a large televised classroom 
section; some live classes taught by 
McConnell, who has written a highly suc- 
cessful textbook and done considerable re- 
search in economics education; and a class 
taught without lectures at all. The TEU 
showed no difference between students 
of GSFs and faculty members; a test of 
90 questions drawn from the TUCE test- 
bank revealed that GSIs* students per- 
formed worse than the control groups. 
This study did not control for teaching ex- 
perience or attributes of the students in 
the classes. Although McConnell and 
Lamphear concluded that the GSIs' stu- 
dents did worse than the students in the 
alternative classes, this finding is true for 
only one of the two test instruments used. 
If the finding that the students of GSFs 
at Nebraska did worse than the students 
of alternative instructional techniques is 
not explained by better than average in- 
struction in the control group, it might 
be because GSFs at Nebraska in the 1960's 
were not required to have passed their 
Ph.D. comprehensive examinations and 
were given virtually no advice and train- 
ing in teaching. 39 

Evaluations of the effectiveness of sub- 
stituting GSFs for regular faculty in teach- 
ing (mostly principles of) economics reveal 
that: (1) "While there are no doubt better 
and worse teachers, they do not divide 
themselves neatly into two groups with 

39 In the 1970*s, however, Nebraska has been one 
of the dozen schools pioneering such training in 
teaching economics. 
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the labels of faculty and graduate stu- 
dents" [116, Oates and Quandt, 1970, p. 
138], (2) Instructor experience seems to 
improve the cognitive performance of stu- 
dents. Coupled with the findings of no dif- 
ference in performance between the stu- 
dents of faculty and GSI's, this suggests 
that GSI's have compensating attributes 
that balance their lack of experience. (3) 
The results are ambiguous on whether un- 
dergraduate students like GSI's more or 
less than faculty. (4) More experienced 
GSI's (and perhaps better trained GSI's, 
if experience is interpreted as a method 
of self-training) seem to be better instruc- 
tors. Those schools which offered special 
training to GSI's on their teaching and/ 
or required more advanced students to 
teach (in which case they had observed 
more teaching themselves) seemed to 
have greater success with GSI's. 

A direct test of the effectiveness of 
teacher training for GSI's was conducted 
by Lewis and Charles Orvis [94, 1973]. 
During the fall of 1971, seven GSI's taught 
3 of the 4 weekly classes of 14 sections 
of principles of economics at the Univer- 
sity of Minnesota. The GSI's were given 
no help in instruction. Students were pre- 
and post-tested using the TUCE, and data 
on student characteristics and attitudes 
were collected for all students. The same 
seven instructors were then used in the 
winter 1972 quarter to teach another 14 
classes. During this quarter the seven 
GSI's were exposed to an integrated 
teacher training program (TTP), consist- 
ing of student evaluations and feedback 
to the instructors, videotaped classroom 
observations, and instructional seminars, 
which had been developed for training 
and assisting graduate student teachers. 
Tests on pre-test scores and student attri- 
butes indicated that the control (fall 1971) 
and experimental (winter 1972) groups of 
students were similar. 

Mean post- TUCE scores ani change ;r »* 
TUCE scores were significantly higher for 



the winter quarter students than for the 
fall quarter students. Multiple linear re- 
gression analysis holding constant pre- 
TUCE, ACT score, grade point average, 
age, sex, and instructor's evaluation score 
confirmed the superior performance of 
students when the TTP was instituted. 
The data also revealed that the GSI train- 
ing system had a significant favorable in- 
fluence on instructors' ratings (on the Pur- 
due Rating Scale). 

The main problem with the Lewis-Orvis 
analysis is the possibility that the increased 
experience of the GSI's from fall to winter 
quarters is what actually caused the im- 
provement in their teaching performance 
rather than the TTP. Otherwise the con- 
trols between the fall and winter quarter 
were excellent. 

The evidence from the Minnesota expe- 
rience, as well as the relatively better per- 
formance of GSFs at those schools that of- 
fer more advice on teaching, seems to 
suggest that at least a moderate amount 
of training may be effective. 

The costs of GSI training programs ap- 
pear to be quite modest. The Minnesota 
program, as well as other programs devel- 
oped at Wisconsin, Harvard, Purdue, and 
a dozen or so other universities, requires 
relatively little graduate student time. To 
further reduce the costs of teacher train- 
ing of GSI's, the JCEE, with the financial 
assistance of the Sloan Foundation, has de- 
veloped a Resource Manual for Teacher 
Training Programs in Economics [137, 
Saunders, Welsh, and Hansen, 1978], 
which provides the materials necessary 
for conducting a TTP workshop. 

V. Lasting Effects of 
Economics Education 

The value of liberal education does not 
consist solely, or even mainly, of the 
knowledge retained. But the case for spe- 
cial emphasis on economics rests on its 
high marginal social product in a mixed 
economy, the functioning of which de- 
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p ids on government polit. ^ strongly af- 
fected by public opinion. 40 The extent to 
which economics training has lasting ef- 
fects is, therefore, an important question 
to investigate. 

That a college course in elementary eco- 
nomics has any lasting impact on the abil- 
ity of citizens to deal intelligently with 
policy issues was challenged by Stigler, 
who advanceu an interesting hypothesis: 
"Select an adequate sample of seniors (I 
would prefer men five years out of col- 
lege), equally divided between those who 
have never had a course in economics and 
those who have had a conventional one- 
year course. Give them an examination 
on current economic problems, not on 
textbook questions. I predict they will not 
differ in their performance" 41 [159, 1963, 
p. 657]. The kind of examination Stigler 
had in mind was not a current events quiz 
but an analytic exercise. 

Two major studies and some minor 
ones 42 have shed light on the lasting ef- 

40 In Section II C above, McKenzie was quoted ?s 
saying that people do not have enough incenti.* 
to maintain the human capital acquired through eco- 
nomic education and in any event will vote their 
own private interests [103, 1977, p 10]. But an edu- 
cation should develop intellectual interests and at 
least lead people to a rational understanding \.hat 
their private interests are. 

41 This hypothesis was the outgrowth of Stigler's 
criticism of the elementary course quoted in the in- 
troductory section of this survey. 

41 A number of studies have been made of the 
lasting effects of high s< ol economics (Bach and 
Saunders [14, 1966]; Moyer and Paden [111, 1968]; 
Saunders [128, 1970]; Weidenaar and Joe A. Dodson, 
Jr. [170, 1972]; C. D. Harbury and R. Szreter [63, 
1968]; Attiyeh and Lumsdeii [9, 1971; 10, 1972]; 
Palmer al [120, 1979]). Stuart Wells characterized 
the results as "inconclusive" [172, 1974, p. 9]. Saun- 
ders and Bach in a study of 96 seniors who had taken 
a one-semester required course in economics as 
sophomores at Carnegie-Mellon University found 
that for those who took no more economics, the aver- 
age score on the TEU declined a little over 10 per- 
cent cetett* paribus (about half the value added by 
the introductory course) but was still markedly 
higher than befbi e taking any economics [1 35, 1970]. 
As Bach and Saunders noted, the TEU is extremely 
simple relative to the content of the Carnegie-Mel- 
lon course. 

In a forthcoming paper, Andrew I. Kohen and Paul 



fects of college economics courses. In the 
case of Bach and Saunders, the findings 
were part of a broader study [13, 1965]. 
In a multiple regression analysis of data 
from a 25-item version of the TEU taken 
by 3,966 high school teachers of social 
studies, Bach anu Saunders found that 
having taken one or two economics 
courses in college did not add significantly 
to the teachers' scor^ [.3, 1965, p. 349]. 
Having taken three or four college courses 
added an amount that was statistically sig- 
nificant but quantitatively small relative 
to other variables. Five or r^ire courses 
added a quantitatively significant amount 
to the score— but not nearly a^ much as 
having watched the television course, 
"The American Economy," three or more 
times a week during the preceding year. 
The test instrument used was not appro- 
priate for investigating Stigler's hypothe- 
sis. 

The one major study devoted mainly 
to lasting effects was carried out by Saun- 
'ers [129, 1971; 132, 1973]. He used multi- 
ple regression analysis to compare three 
oairs of students or alumni. In each pair, 
c ne group had had an introductory college 
course in economics whereas the othei 
had not. One pair included a set of sopho- 
mores who had just completed the intro- 
ductory course and a set who had not; the 
second of seniors, one group of which had 
had the course two years earlier; the third 
of alumni five years after graduation. The 
measure of output was a hybrid version 
of the Test of Understanding in College 
Economics consisting of 33 questions cho- 



H Kipps estimate the decay rate of microeconomic 
knowledge acquired in an introductory course at 
James Madison University as 20 percent per year 
[85, 1979]. In another forthcoming paper, Eleanor 
D. Craig, James B. O'Neill, and Douglas W. Elmer 
of the University of Delawaie find that class size 
did not affect retention [38, 1978]. (This is contrary 
to an earlier finding by Craig and O'Neill [37, 1978]). 
They also report that "the juniors who wer^ fresh- 
men during the course, outoerformed the senu, . 
(who had been sophomores^" [38, 1978, p. 2]. 
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sen from the 132 in the TUCE to avoid 
technical analysis and specialized termi- 
nology. Twenty-three schools participated 
with a total of usable responses of 1,220 
for sophomores, 955 for seniors, and 1,257 
for alumni. Each respondent answered a 
detailed questionnaire, which provided 
information about variables hypothesized 
to be associated with differences in per- 
formance on the hybrid TUCE, interest 
in economics, and reading habits. 

Saunders found that introductory eco- 
nomics courses did have a lasting impact 
on test performance. It diminished over 
time. Other things remaining the same, 
sophomores with an introductory course 
in economics scored 6.18 points higher on 
the hybrid TUCE than comparable sopho- 
mores without such a course; seniors who 
had had such a course two years previ- 
ously scored 4.76 points higher than the 
corresponding control 3 r0U P; and alumni 
five years out of college scored 3.24 points 
higher if they had had an introductory 
cotuie. Other things constant, each letter 
grade in introductory economics was sig- 
nificantly associated with a difference in 
total TUCE score of 2.00 for sophomores, 
1.69 for seniors, and 1.09 for alumni. Each 
course taken beyond the introductory 
level was associated with a difference in 
test performance of .53 points in the sen- 
ior sample and .48 in the alumni sample. 
A difference of one letter grade in each 
such course was associated with test score 
differences of .18 and .15 for seniors and 
alumni respectively. Among variables 
consistently and significantly associated 
with test scores in all three samples were 
the "intellectualism" of a school's student 
body, general interest of the person in 
economics as a*subject, and reading the 
economics or business section of a weekly 
news magazine. Introductory economics 
courses and course grades did not ap- 
pear to have a lasting impact on reported 
general interest in economics as a sub- 
ject. 



Two limitations of Saunders's study 
should be noted. Since he used a cross sec- 
tion of three sample groups at a single 
time rather than a longitudinal survey, the 
results may be suspect if the content or 
quality of the courses changed from the 
alumni group to the sophomores sampled. 
The use of questionnaires mailed to 
alumni has the usual response bias. 

Despite the lim'tations, Saunders's 
study is one of the most important in the 
field of economics education. Though it 
did not test Stigler's hypothesis directly, 
it clearly implies that introductory college 
courses do result in a lasting increase in 
economics understanding. The difference 
between his results and the earlier study 
of Bach and Saunders may be attributed 
to the fact that one-third of those in the 
earlier study took their economics in 
teachers colleges, where the courses may 
have been inferior [13, Bach and Saun- 
ders, 1965, pp. 351-52]. It may also be 
associated with the different test instru- 
ment used, though the results of Saunders 
and Bach [135, 1970] indicate otherwise. 
As Bach and Saunders said of their results, 
'These findings emphasize again the well- 
known psychological principle that 'learn- 
ing* unsupported by motivation and rein- 
forcement • • • has a very short half-life" 
[13, 1965, p. 354]. 

VI. Summary and Conclusions 

A. What Have We Learned? 

The research on teaching college eco- 
nomics is voluminous. A substantial 
amount has been of good quality, some 
of it of high quality. So what have we 
learned? 

• Different students learn economics in 
different ways. The best teaching strategy 
provides alternative learning methods di- 
rected toward the different needs of dif- 
ferent students. 

• Of the new 'caching methods, the most 
effective seems to be computer-study- 
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management programs. In this method, 
the students take frequent tests and are 
given different assignments depending on 
the test results. 

• Programmed learning is efficient in the 
sense of bringing students to a given level 
of competence in less time, but generally 
students do not like it. 

• Students like self-paced instruction, and 
it increases learning in some circum- 
stances. 

• Graduate students generally are just as 
good teachers as regular faculty even 
though, other things equal, experience re- 
sults in better teaching. 

• Graduate students who have had 
teacher training are better instructors 
than those who have not. 

• A one-year course in elementary eco- 
nomics has lasting effects in the form of 
greater economic competency. 

• Computerized games may be fun but 
they do not seem to be worth the cost. 

B. How Can Research Methodology Be 
Improved? 43 

Modeling. Economics education is grop- 
ing for a formal model, or set of models, 
of the instructional process. In the past 
much research in the field consisted of a 
college instructor regressing final exami- 
nation scores for students on an ad hoc 
set of easily obtained student characteris- 
tic variables and a variable indicating 
whether the student participated in a 
certain pedagogical experiment. Little 
thought was given to model specification, 
and functional form was determined arbi- 
trarily. 

More recently, McKenzie and Staaf 
[104, 1974] and Daniel Graham and Kel- 
ley [55, 1974] have made progress toward 
exploiting the rich theoretical models of 
microeconomics, which explicitly con- 
sider behavioral reactions of students and 

45 An excellent survey of conceptual and empirical 
issues in estimating educational production functions 
is Eric Hanushek [62, I979J. 



faculty to changes in prices and budget 
constraints. The application of such mod- 
els promises to bear fruit, as the empirical 
literature on student and faculty behavior 
has already indicated. However, a sub- 
stantial gap remains between theorizing 
and empirical research. Progress can be 
made by synthesizing and integrating the 
empirical work with the insights that are 
available from comprehensive theoretical 
models based on maximizing behavior. 

Many of the alleged independent varia- 
bles in economics education production 
function studies are endogenous. This has 
been recognized recently (Allison [2, 
1976; 3, 1977]; Soper [153, 1976]; Craig 
Swan [160, 1978]; Becker and Salemi [18, 
1977]), but most research conclusions con- 
tinue to rest on single equation models. 
There are studies that use grades to pre- 
dict course evaluation ratings, course eval- 
uation ratings to predict grades, cognitive 
performance to predict course evaluation 
ratings, course evaluation ratings to pre- 
dict cognitive performance, and grades to 
predict cognitive performance; and while 
there do not seem to be any explicit stud- 
ies, everyone presumes (hopes) that cogni- 
tive performance affects grades. The need 
to develop a simultaneous equations 
model of this process is obvious. 

Most of the evaluations of teaching in- 
novations have been specified so that the 
treatment group enters the model as a 
shift parameter. This ignores the possibil- 
ity that the effect of the alternative tech- 
nique may be to improve the marginal 
product of some other factor that is al- 
ready included in the equation. For exam- 
ple, by using separate equations, Siegfried 
and Strand [144, 1977] found that self- 
paced instruction v/as more effective for 
females than for males. Utilizing Blinder *s 
method [19, 1973], Chizmar, Hiebert, and 
McCarney [31, 1977] found that users of 
computer-assisted instruction learned 
more economics in sr/ v e of CAI (rather 
than because of it). 
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One of the better modeling efforts is 
the Harvard project [3, Allison, 1977]. It 
constructs and estimates a three-equation 
model that adapts to the non-linearities 
and simultaneity inherent in the educa- 
tional process. The model includes an 
equation describing student decisions 
about the allocation of time and effort; a 
production function equation that relates 
stude, effort, ability, and pedagogy to 
achievement; and a "profit function," re- 
lating student effort and achievement to 
student enjoyment of the course. Esti- 
mates from a constant elasticity of substi- 
tution form of the production function re- 
veal "elasticities" of achievement with 
lespect to student ability, pedagogic in- 
puts, and effort of roughly .89, .40, and 
.25. 

Sample Design. Research in economics 
education would be more credible if eval- 
uation of innovative teaching technologies 
were conducted by individuals other than 
those who devise the new methods. 

Too many studies in economics educa- 
tion have been one-school, one-time ex- 
periments. There are examples of large, 
broad-based data sets (Attiyeh, Bach, and 
Lumsden [6, 1969]; Attiyeh and Lumsden 
[9, 1971]; Saunders [132, 1973]), but they 
are rare. The single institution studies do 
not provide sufficient observations and 
sufficient variation in many variables re- 
quired to disentangle the complicated re- 
lationships inherent in the educational 
process. 

The main deficiency of the evaluations 
of innovative learning technologies, how- 
ever, may involve the assignment of stu- 
dents to experimental and control groups. 
One rationale for the identification and 
evaluation of alternative pedagogical 
techniques is that efficient student-learn- 
ing processes vary across individuals and 
the opportunity to select among alterna- 
tive methods of learning permits students 
to choose the technique best suited to 
each individual, thus facilitating efficiency 



in student learning. This suggests that the 
ultimate objective is not to entirely re- 
place the conventional lecture format 
with alternative methods, but rather to 
offer options to (presumably well-in- 
formed) students. 44 If this is true, then the 
gain in learning will accrue to those stu- 
dents who would elect the experimental 
option and learn more in their courses 
than they would have learned in the con- 
ventional format had the experimental 
option been unavailable. Therefore, the 
experimental group should consist of stu- 
dents who voluntarily elect the experi- 
mental method and the control group 
should consist of students who would vol- 
untarily elect the experimental technol- 
ogy but are actually in the conventional 
course 

Neither of the two methods typically 
used to identify control and experimental 
students — ran ^m assignment or volun- 
tary self-selection — satisfies these criteria. 
With randomly assigned experimental 
and control students, some individuals are 
assigned to the experimental class who 
would have elected the conventional class 
with unconstrained choice. This implies 
that they believed a priori that the con- 
ventional format was more conducive to 
their learning of economics. Such people 
would not be found in the experimental 
course if it were offered as a regular op- 
tion, and their inclusion in the experimen- 
tal group confounds the empirical test. A 
similar argument can be made with re- 
spect to students who would elect the al- 
ternative technology if confronted with 
choice, but who are assigned (randomly) 
to the conventional fori? in the experi- 
ment. The performance of both groups 
may be worse than it would be if all stu- 
dents could select the personally most 

** On the other hand, economies of scale relative 
to demand at some colleges may dictate the choice 
of only one pedagogical technique for economics in- 
sti uction, in which case the usual random assignment 
of students to experimental and control groups is a 
satisfactory procedure. 
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suitable technology. Unfortunately, biases 
may not balance, and the empirical results 
may consequently provide misleading sig- 
nals as to how useful the alternative tech- 
nology would be vis-d-vis the conven- 
tional format for those students who 
would elect the new method under free 
choice** 

If (informed) students are allowed to 
voluntarily choose whether to enroll in 
the experimental or control classes, we 
can infer that those choosing the alterna- 
tive technology believe a priori that it 
would be helpful to them. So the experi- 
mental group includes the correct individ- 
uals. However, the control group then 
consists o^ students who have revealed 
that they would not elect the alternative 
method even if it were offered to them 
(as it was). Thus the control group does 
not provide any guidance as to how those 
who elect the new method would have 
performed in conventional classes if the 
alternative had not been offered. 

In experiments to evaluate alternative 
pedagogical techniques it is necessary to 
identify students who would choose the 
experimental technology, and then divide 
them into an experimental and control 
group. 46 

Measuring Outputs and Inputs. Many 
of the variables used to measure either 
inputs or outputs are confined to a small 
number of values. For example, usually 
there are only five possible course grades. 
Course evaluations commonly use a scale 
of 1 to 5. Often the underlying basis for 
such variables is an ordinal ranking, in 
which case (for independent variables) a 

45 A rigorous demonstration of the sample bias in 
most economics education research is contained in 
Siegfried and George Sweeney [145, 1979]. 

4 *This experimental design runs the risk of bias 
from a "negative Hawthorne eflfect." There is danger 
that students assigned to the control group, knowing 
and preferring the experimental pedagogy, might 
perform below normal because they are dissatisfied 
at being denied the opportunity to learn in the ex* 
perimental course. 



series of binary variables would be supe- 
rior to arbitrarily imposing a cardinal rela- 
tionship on the ranking (e.g., assuming 
that an A is worth twice a Q. 

A more serious problem, however, 
haunts the measurement of output and 
input. Many variables have upper and 
lower limits (e.g., ceiling and floor effects 
of "gap-closing" measures of cognitive 
achievement), in which case the error dis- 
tribution is truncated, causing heteroske- 
dasticity and biased parameter estimates 
if ordinary least squares regression is used 
[70, Thomas Johnson, 1979]. The practical 
importance of this limitation has recendy 
been demonstrated by Lee Spector and 
Michael Mazzeo [157, 1978] who, in a 
study of self-paced instruction, found that 
the effect of a PSI introductory course on 
student grades in a later course was statis- 
tically significant and positive using ordi- 
nary least squares. When they adopted the 
more appropriate probit estimation tech- 
nique, the null hypothesis (no effect of PSI) 
could no longer be rejected. There are 
now available several economical tech- 
niques for handling limited-value depen- 
dent variables [70, Johnson, 1979]. 

The major difficulty, however, with the 
measurement of outputs is the failure of 
most economics education research to rec- 
ognize that students may elect to use their 
efficiency gain from a more efficient 
method of teaching economics to "pur- 
chase" additional knowledge in some 
other discipline (or perhaps "purchase" 
leisure). Such gains are overlooked when 
conventional measurement techniques 
are used. The empirical research on study 
time comes closest to recognizing this 
problem. 

Multicollinearity. The ad hoc nature of 
most empirical investigations of the pro- 
duction function for economics education 
has blurred the distinction between con- 
ceptualizing a model and measuring the 
variables in it. Consequently multiple 
measures of single concepts often appear 
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in one equation, causing multicollinearity 
and substantive errors in interpretation. 47 
Controlling for all of the obvious factors 
that learning theory predicts to affect out- 
put is important for reducing bias in the 
estimated parameters. However, a deci- 
sion must be made when there are dupli- 
cate measures available for one factor. 

If and when theory provides little or 
no guidance as *o which of several alterna- 
tive measures is superior and more than 
one is included in the model, then the 
group of measures should be tested jointly 
with an /'-test. It is likely that this ap- 
proach would explain the apparent para- 
dox of so many standard variables being 
insignificant in empirical studies of eco- 
nomics education. A joint .F-test on a 
group of variables is also appropriate 
when a change in one variable necessarily 
causes a change in another variable, since 
the impact of changing the one variable 
alone is then irrelevant. For example, if 
an achievement equation contains a bi- 
nary variable for an experimental teach- 
ing method and an interaction term that 
is the product of the experimental binary 
variable and, say, SAT scores, it makes no 
sense to consider the separate effect of 
the first variable alone, since changing Lhe 
teaching method would necessarily 
change the value of the interaction term 
also. 

Multicollinearity has clearly been rec- 
ognized in economics education research 
(Soper [150, 1973; 153, 1976]; Becker [16, 
1976]; Highsmith [67, 1976]; Sv.un [160, 
1978]). The continuing debate centers on 

47 Describing production function studies of edu- 
cation in general, Hanushek argues that . . multi- 
collinearity does not appear to be the villian it has 
been made out to be, although it may partially ex- 
plain some of the apparent inconsistencies in existing 
research. . . . The usual terminology for regression 
analysis is misleading here: Right hand variables are 
often called independent variables, but this does not 
imply that they cannot be correlated. In fact, multi- 
ple regression analysis is used because there are cor* 
relations among the 'independent* variables** [62, 
1979, pp. 351-68]. 



what to do about it. 48 In addition to elimi- 
nating extraneous measures of single fac- 
tors and jointly testing multiple measures 
of a single factor for statistical significance, 
it may be possible to incorporate new in- 
formation into the model to circumvent 
the problem. For example, if in a time- 
series regression of Y on X\ and X* 9 X\ 
and X 2 are highly collinear, one might 
turn to cross-sectional data where, luckily, 
X\ is constant across observations. The re- 
lationship between X 2 and Yean then be 
estimated from the cross-sectional data 
and the time-series analysis used to esti- 
mate the coefficient of X x while c strain- 
ing the coefficient of X 2 to the value deter- 
mined from the cross-sectional analysis. 
Another new technique for dealing with 
multicollinearity is "ridge regression," Its 
advantages, however, come at considera- 
ble cost [70, Johnson, 1979]. 49 

Our criticisms of research methodology 
certainly do not apply without exception. 
Many studies have handled these prob- 
lems well. For example, Attiyeh, Bach, 
and Lumsden employed a sample of 4,121 
students at 48 different colleges and uni- 
versities to estimate a production function 
for economics education [6, 1969]. Allison 
devised and estimated a simultaneous 
equations model of the education pro- 
duction function for introductory eco- 
nomics at Harvard using over 2,400 
student observations [3, 1977]. Her study 
included a model of student time alloca- 
tion based on utility maximization and al- 
lowed for non-linearities (i.e., diminishing 
marginal returns to certain factors) and 

41 Some of the earlier studies of economics educa- 
tion production functions occasionally used stepwise 
regression to choose among a plethora of variables. 
The level of sophistication of research methods has 
now taken us beyond such thoroughly discredited 
techniques for hypothesis testing. 

49 Ridge regression changes some of the (diagonal) 
/alues in the regression matrix This reduces the 
standard errors of the estimated coefficients, provid- 
ing narrower confidence intervals However, it also 
introduces unknown biases into the point estimates 
of the parameters. 
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differential effects on different types of 
students. Becker and Salemi evaluated an 
audiovisual tutorial package using data 
from six colleges [18, 1977]. They explic- 
itly considered a nonlinear theoretical 
model of learning; modeled the ceiling ef- 
fect of the gap-closing cognitive achieve- 
ment measure; corrected for simultaneous 
equations bias; controlled for variable in- 
puts (i.e., student study time); specified in- 
teraction terms, which permitted the 
identification of beneficiaries of the tuto- 
rial package; and analyzed the marginal 
costs of the program and attempted to 
weigh them against the marginal benefits. 

The field of economics education is a 
teenager, experiencing the growing pains 
of adolescence. The rapid improvement 
in research methodology being applied to 
problems of teaching in economics should 
not be surprising in view of this youth. 
Indeed, what is remarkable is the progress 
that has been made in such a relatively 
short span. 

C. Where Do We Go From Here? 

There are two major thrusts that should 
be undertaken to complement the im- 
provements in research methods dis- 
cussed above. First, many of the good one- 
school studies need to be replicated else- 
where. The evidence from Attiyeh, Bach, 
and Luinsden [6, 1969], that different 
types of schools matter, and from Chizmar 
etal [31, 1977], that the effect of teaching 
methods on economics learning may de- 
pend on the level of human capital availa- 
ble, suggests the importance of replica- 
tion. In addition, there is danger that the 
innovator invested substantially more 
time in teaching an experimental course, 
in which case advantages attributed to a 
new method actually may be returns to 
faculty effort. This danger is reduced if 
the innovation is implemented and evalu- 
ated by someone other than the initiator. 
Finally, replication will add to the sample 
size and improve confidence in the objec- 



tivity of assessments of teaching methods. 
The most effective means of gaining large 
sample sizes and sufficient variation in ex- 
planatory factors to permit sophisticated 
modeling while minimizing the danger of 
variation in the implementation of teach- 
ing methods is large-scale research pro- 
jects similar to those of Attiyeh, Bach and 
Lumsden [6, 1969], Saunders [129, 1971; 
132, 1973] or Attiyeh and Lumsden [9, 
1971]. 

Second, most of the research in econom- 
ics education has been concerned with the 
college principles of economics course, 
and rightly so. However, there are other 
important vehicles of economics educa- 
tion, and perhaps the early marginal prod- 
ucts from an assessment of them exceed 
the diminished marginal product of yet 
another twist on programmed instruction, 
televised lectures, or computer games. 
For example, most of the economics edu- 
cation in the United States comes through 
the popular rress. Very little assessment 
has been mf.de of advertising campaigns 
by corporations, trade association educa- 
tion (propaganda?) efforts, or press cover- 
age of economic events. What is the im- 
pact of the PBS series "Economically 
Speaking" or "The Age of Uncertainty"? 
What is the quality of economics reason- 
ing among the business and economics 
staffs o f local newspapers? 

Besides the principles course there is 
the economics major. While there have 
been a few studies of upper division 
courses (economic statistics, intermediate 
theory, money and banking), for the most 
part these courses have been neglected 
by research. This may be due to the ab- 
sence of clearly defined goals for the eco- 
nomics major, of which these courses are 
usually a part. Since undergraduate ma- 
jors in economics pursue diverse careers — 
law school, business school, graduate 
school in economics (and other disci- 
plines), business employment, govern- 
ment employment, entrepreneurship — it 
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is not obvious how the undergraduate ma* 
jor curriculum should be structured. In- 
deed, there is substantial disagreement on 
whether it should be career oriented at 
all. 

Data on undergraduate instruction in 
economics at U.S. colleges and universities 
could be useful in planning and adminis- 
tering Ph.D. programs, since undergradu- 
ate majors are an important component 
in the derived demand for Ph.D.'s [138, 
Scott, 1979j. Before research proceeds to 
courses beyond elementary economics, 
we need to know how many people annu- 
ally enroll in which courses and for what 
purposes. 50 The debate on course objec- 
tives, while unsettled for the principles 
course, may be highly controversial for 
the set of courses that constitute a major. 

There are many other important areas 
for research on teaching college econom- 
ics. For example, further refinement of 
the input coefficients in the production 
function will be useful as colleges and uni- 
versities begin belt-tightening in response 
to the diminishing college age population 
in the next decade. Research findings may 
help college administrators to decide be- 
tween reducing budgets by increasing 
class size or by hiring younger, and less 
experienced, instructors. 

In spite of rapid growth in alternative 
pedagogies, the blackboard and textbook 
remain the staple inputs into the process. 
But we know embarrassingly little about 
how differences among textbooks affect 
the learning and attitudes of students. We 
have done little detailed analysis of the 
generalizable attributes of classroom 
teaching behavior. The JCEE's Teacher 
Training Program contains many helpful 

50 A count of professors teaching various courses 
by a company that selb mailing lists to publishers 
listed 6,i30 for principles of economics, 1,671 for 
intermediate micro theory, 1,408 for money and 
banking, 1,404 for intermediate macro theory, and 
859 for labor economics as the five most popular 
course areas in 1978-79 [33, College Marketing 
Group, n.d.]. 



hints to improve lectures, but to date 
there has been little, if any, systematic re- 
search to verify their effectiveness. 

It appears that the standard production 
function studies of college teaching have 
been unable to explain much of the varia- 
tion in measured outputs. This may be be- 
cause the inputs or outputs have been 
measured inadequately, or because the 
model was specified poorly. An alternative 
explanation is that the important determi- 
nants of variations in student learning are 
at the micro level — individual students 
have such different learning processes 
that generalization is almost futile. From 
the more general research on education, 
Davis Armor et al [5, 1976] explain the 
apparent ineffectiveness of alternative 
pedagogies by the substantial variation in 
implementation of them at the classroom 
level. Therefore, we need to learn more 
about the process by which instructors 
adopt teaching methods and tailor them 
to individual students. 

D. Conclusion 

A cumulative literature on economics 
education has now developed. As in other 
subfields of economics, those who would 
publish must search the literature for pre- 
vious findings and build on them. They 
must also bring to bear the tools of eco- 
nomic theory and econometrics. The qual- 
ity of the research done so far varies 
widely, but dramatic improvement has oc- 
curred in recent years; much more of the 
current research is first rate than was the 
case 15 years ago. 

An economist can now make a decent 
living by specializing in economics educa- 
tion. True (perhaps unfortunately for un- 
dergraduate students), it is not a prestige 
subject in high demand. A young econo- 
mist wanting to win the Clark medal 
needs to earn his reputation elsewhere. 
But the field is rapidly becoming respect- 
able, and the research findings can be use- 
ful to college economics teaching. 
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Male-Female Differences in Economic 
Education: a Survey 

John J. Siegfried 



There have been numerous efforts to examine the relationship between gender 
and student performance in the college-level introductory economics course. Most 
such studies have examined the association of gender with student learning as a 
by-product of their primary objective. 1 Student gender is usually added as a control 
variable in statistical studies associating some experimental teaching technique with 
student learning. For this reason, many of the studies which evaluate the association 
of sex with student learning suffer from methodological deficiencies. 

There also have been some studies of the association of gender with student 
performance in economics in primary grades and in high school. Finally, there is a 
little evidence concerning gender differences in understanding economics among 
college graduates. . t 

This survey fir,t examines the hypotheses that are commonly used to justify the 
inclusion of a binary variable for gender in models of the "production function' ' for 
learning. Second, the distinction between understanding and learning economics is 
made, and its implications for the empirical studies are described. Third, the many 
empirical studies examining the association of gender with economic education are 
summarized and evaluated. The effect of gender on learning and understanding is 
distinguished and the question of when gender differentials appear \s addressed. 
Fourth, gender differences in the effectiveness of alternative teaching techniques is 
discussed. The final section reports differences between men and women students in 
their enjoyment of and interest in economics. 

Hypotheses 

The standard hypothesis underlying most empirical tests of the effect of gender 
on economic learning or understanding is that female students have grown up in a 
cultural environment in which girls are not supposed to like business and thus have a 
disadvantage in business or economics courses. More recently, Garron has argued 
that the difference between male and female performance in learning spatial and 
numerical skills, which are related to understanding economics, is chromosome- 
linked. 

MacDowell, Senn, and Soper group the sociocultural explanations into four 
different, though related, categories. First, they report that the psychological 
literature consistently finds that identity issues in adolescent women tend to focus on 
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their search for a husband, and many think and act as though they will be rejected by 
boys if they depart far from adolescent stereotypes, one of which is that business is a 
man's world. Second, there apparently is evidence that sex stereotypes are stronger in 
higher-income families. Since lower-income families devote a greater share of their 
available income to sending male offspring to college (because college support is 
rationed to sons before daughters), there will be a greater than average proportion of 
women in college who subscribe to sex stereotypes. Third, young females are 
significantly more dependent than males. It is possible that economics courses are 
taught in ways that penalize dependency, for example, in large lecture formats. Thus 
teaching techniques may be better suited to males than females. Finally, differences 
in maturation rates may explain differences in economics skills. The psychological 
literature finds that people who mature earlier have higher verbal learning rates while 
spatial skills are less developed for any given age. Since women generally mature 
earlier than men, we would expect women to have a comparative advantage in verbal 
skills while men obtained a comparative advantage in spatial (economics?) skills. 2 
Ladd adds a fifth possibility: that teacher attitudes and learning material, especially in 
precollege education, may be at fault. 

There is, of course, no reason to single out any one cause of the differences in 
economics mastery between men and women. Several or all of the hypotheses may be 
working simultaneously. 

Understanding Versus Learning Economics 

The level of understanding is the stock of knowledge about economics and 
business at a point in time. Learning is the flow of new knowledge that occurs over a 
period of time. 

If there is a concern about the differences in economic understanding between 
adult males and females, and one desires to find out what actions might be taken to 
reduce these differences (presumably by improving the cognitive levels of women 
rather than reducing tnose of men), then determining the age at which the differentials 
appear is important. For example, do the differences in understanding of business and 
economic concepts grow slowly over the years , or is there a pattern of no differences 
until a certain age after which differences appear and then remain stable thereafter. If 
the latter pattern characterizes the development of differences in understanding, then 
we could focus corrective efforts on people during a certain stage of development. On 
the other hand, if the differentials grow slowly, the cost of reducing the differentials is 
likely to be greater since action would have to be taken over a broader group of 
women, of various ages. 

Most of the studies of gender and student performance in economics fail to 
consider explicitly whether it is the stock or flow of knowledge that they expect 
gender to influence. The design of most experiments that include a binary variable for 
gender is aimed primarily at evaluating some alternative teaching technique, so, little 
attention is devoted to whether the impact of gender on performance is specified 
sensibly. Consequently some studies measure student understanding at a point in time 
and other studies measure learning of economics over a period of time (usually during 
a course). This variety provides an opportunity to distinguish the effect of gender on 
the stock of economic understanding from the effect on flow. 

Studies that examine the stock of knowledge at a point in time generally report 
correlations between student performance on a final examination and student gender. 
Studies that assess the relationship between gender and the flow of knowledge either 
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use a "value-added" measure of performance, or include a pretest control variable in 
their model before correlating a post-test performance measure with gender. 

The Empirical Evidence 

It is difficult to summarize and compare the various studies of gender and 
economic education because they vary considerably in research methodology , sample 
size, and unit of observation, and frequently they confuse the hypotheses. In addition, 
different levels of statistical significance are used, with the result that a finding which 
would be statistically significant at the .05 level may be declared insignificant if the 
significance criterion is .01. Rather than try to unravel these differences in statistical 
methodology, we have simply based significance conclusions on the criteria 
employed in the particular studies. Therefore, the summary provides only a rough 
indication of the findings. Finally, although a count of the studies which find positive 
or no effects of gender on economic education may be illuminating, one must 
recognize that the quality of the research varies, and there is no compelling reason to 
assign equal weights to each of the studies. 

Two-thirds of the studies that related the level of understanding with gender 
found that men performed statistically significantly better than women. Only 
one-third of the studies that examined the flow of students' learning during (mostly 
college) courses found that men did statistically significantly better. Although there is 
a substantial division of opinion, in general the empirical research seems to suggest 
that by the time people reach college age, men are significantly ahead of women in 
understanding economics, but both sexes are progressing at equivalent rates; thus the 
gap would appear to be stable by that time. These results are consistent with Buckles 
and Welsh's findings that there is very little change in the rank of students in the 
principles course from the beginning to the end. 

Data on differences in understanding economics between the sexes by level of 
schooling can reveal the time when such differences develop. In a study of 
elementary-level students, Davison and Kilgore found no sex-related differences in 
the stock of economics understanding. Ladd surveys evidence that concurs with the 
Davison-Kilgore conclusion. By the time students graduate from high school, 
differences begin to appear. Highsmith in a study of grades 7-12, Moy er and Paden in 
a study of high school students, and Thornton and Vredeveld, also examining high 
school students, all found a statistically significant advantage for males in the level of 
understanding of economics. Becker, Helmberger, and Thompson report contradic- 
tory evidence while evaluating Project DEEP's effect on high school students' 
understanding of economics. MacDowell, Senn, and Soper have done the most 
extensive study of the effect of gender on high school students' performance in 
economics. In an evaluation of the World of Work Economic Education (WOWEE) 
Project, they gathered test scores for about two thousand Illinois junior and senior 
high school students on the nationally normed Junior High School Test of Economics. 
They regressed post- WOWEE test scores on pretest scores, a binary variable for 
students' gender, a binary variable for teacher's gender, and a binary variable which 
was 1 if the student's and teacher's gender were the same, and zero otherwise. Their 
results indicate no difference in post-test scores on the basis of students' gender. The 
students of male teachers did significantly better. The student-teacher interaction 
variable was nonsignificant. MacDowell-Senn-Soper conclude that gender differ- 
ences have not yet surfaced by about age 15, the typical age of students in their 
sample. However, this conclusion seems unwarranted because they control for pretest 
scores in their analysis. The measured effect of gender in their study is on the flow of 
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knowledge during the World of Work Project. If males were already ahead of females 
by the time the project commenced, the empirical results would not reveal the 
difference. Thus, their study appears to reveal no gender-related differences in 
student learning of economics in secondary schools. 

The studies all show that the difference in economics understanding between 
males and females has already developed by high school. However, it is impossible to 
quantify the magnitude of the effect and compare it to differences at the college level 
(in older to determine if further widening of the gap occurs at the transition from high 
school to college) because of the absence of a standardized measurement instrument 
that would be appropriate for both high school and college students. 

There are many studies on college students, perhaps because of the relatively 
low cost of acquiring data. For junior college, Thompson finds no difference in levels 
of understanding of economics. Wentworth and Lewis, Weidenaar and Dodson, and 
Lewis, Wentworth, and Orvis find no relationship between learning introductory 
economics and students' gender. 

For students in the principles-of-economics course in four-year colleges in the 
United States, many studies reveal no difference in learning of economics during the 
course: Buckles and McMahon; Kelley (1975); Paden and Moyer, Lewis and Dahl 
(1972a); Weidenaar; Gery; Ramsett, Johnson, and Adams; Siegfried and Strand; and 
Elliott, Ireland, and Cannon. In contrast, there are relatively few studies which report 
that there is a statistically significant difference in learning of economics during the 
principles course on the basis of gender Crowley and Wilton; Sloane; Tuckman; 
Soper and Thornton; and Soper (1973, 1976). There exist carefully done studies with 
relatively large samples on both sides of this issue, but the preponderance of the 
evidence on student learning and gender in introductory economics suggests no 
difference. 

The findings regarding gender-related differences in level of understanding in 
the college principles-of-economics course in the United States contrast with the 
absence of a gender difference in learning. Those studies finding that men do 
significantly better include Bach and Saunders; Bolch and Fels; Allison (1976a, 
1976b, 1977); Marston and Lyon; Soper and Thornton; Lewis and Orvis; Soper 
(1973); Chizmar, F:ebert, and McCarney; Clauretie and Johnson; Siegfried and 
Strand; and Saunders (1975). In contrast, no gender difference even on absolute 
levels cf performance at the principles level was discovered by Paden, Dalgaard, and 
Barr, Emery and Enger, Lewis and Dahl (1972b); Danielsen and Stauffer; Marston, 
Lyon, and Knight; and Morgan and Vasche. Here the weight of methodological 
superiority and sample size belongs to those studies which have found a statistically 
significant advantage for men in absolute performance. 3 

Although Harbury and Szreter report no gender differences in the level of 
understanding of economics among U.K. college students, in a study of 4,700 U.K. 
college students at thirty-seven universities, Attiyeh and Lumsden (1971, 1972) 
report that men do better on measures of both understanding and learning. Palmer, 
Carliner, and Romer found no learning difference between men and women in a 
Canadian introductory course. 

The tests at the principles-of-economics level were based on samples for all 
types of colleges and universities (large and small, private and public, northern and 
southern), using various evaluation instruments (TUCE, TEU, TEC, and CLEP), and 
have been parts of evaluations of all types of teaching techniques (programmed 
learning, self-paced instruction, televised lectures, computer-assisted instruction). 

There is evidence that male-female differences in understanding persist beyond 
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the introductory course, and may even grow. Saunders and Bach and Bach and 
Saunders (1965, 1966) assessed the level of understanding of seniors in college, high 
school teachers who watched the television program, 'The American Economy," 
and a large sample of graduates of U.S. colleges. They found that men did better on 
almost all of their tests of understanding of economics. The most comprehensive 
evidence available on the persistent differences in understanding between female and 
male adults is from the Bach and Saunders (1966) and the Saunders (1973) studies. 
Bach and Saunders found that men who graduated from prestige schools, large 
universities, teachers colleges, and "other schools" did significantly better on the 
TEU than women with similar educational backgrounds. The sample sizes were 103, 
503, 1,251, and 1,945, respectively. The diversity of schools represented in the 
sample lends considerable confidence to their findings. Likewise, Saunders 1 
regressions with 1,220 sophomore, 955 senior, and 1,257 alumni respondents from 
twenty-five carefully selected schools found a significantly better performance of 
males than females on a hybrid TUCE for each group. Since there was no pretest 
control or gap-closing measure of achievement, Saunders' study confirms the 
superior level of understanding of economics by males, but not their level of learning. 
The difference showed up at the sophomore level and increased slightly through the 
progression to senior and alumni status. 

More recently, Kohen and Kipps, in a study of 59 students at James Madison 
University, found no difference in the TUCE scores between men and women who 
were beginning an intermediate microeconomics course, after controlling for 
achievement levels at the end of introductory economics, time elapsed since taking 
introductory economics, overall grade point average, and number of credit-hours in 
business-economics-mathematics courses. A test for interaction between gender and 
elapsed time would have revealed any differences in the rate of depreciation of 
economic knowledge, but was not done. 

Bach and Saunders ( 1 966) is the only study surveyed which found a group with a 
statistically significant advantage for women; it was for people who had graduated 
from fifty liberal arts colleges. The sample size was 138. However, Rothman and 
Scott find no difference between men and women on a post-TUCE examination when 
women were at a significant disadvantage at the beginning of the course (measureed 
by pre-TUCE), implying greater learning by women during the course. In a study of 
what students learn from the economics major at Harvard after their introductory 
economics course, Hartman found no differences between men and women. 

Probably the greatest insight into the reasons for differences in performance of 
men and women on economics achievement tests is found in Allison's (1977) 
investigation of the educational production process in introductory economics at 
Harvard. Using a three-equation, simultaneous nonlinear equations model of the 
learning process, she found that there was no gap between men and women in 
understanding economics by November of the fall semester, but by January men 
performed slightly better, and by June (of the second semester) men perform-d 
significantly better. A separate equation in the model confirmed that this difference 
was not due to a greater effort being exerted by men. 

The explanation, according to AIL . ^n. was uncovered in separate estimates of 
the model for men and women, which revealed that the marginal product of hours 
spent studying economics was greater for men than for women. When the year-long 
course began men had about a two to one advantage in marginal product of effort, and 
by the end of the course this gap had widened. Consequently , Allison developed the 
following account of the source of differential performance: "Women enter the 
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course with less "skill" in learning analytic material — less practice in that peculiar 
intellectual exercise of model building. Thus, learning analytic material comes 
slowly; much of each he* * studying is misdirected. Consequently, at the end of 
the semester they have k natively little economics per unit of time, and even 

less economic theory . Nor /e they learned on the average analytical material as well 
as their male counterparts. Thus, in the second semester women are, given the 
cumulative nature of economics, doubly disadvantaged. Since (as revealed by other 
evidence in the study) they are more sensitive to grades than their male counterparts, 
they do not reduce their effort during the second semester. But as indicated in the 
second semester enjoyment equation, they ultimately find the experience relatively 
unsatisfying" (1977, pp. 40-41). 

Allison's account is consistent with evidence on the difference in performance 
between men and women on particular types of achievement examination questions. 
She found a much greater disadvantage for women on theory questions than on other 
types of questions. Saunders (1975) found that the positive association of male sex 
with test performance (level of understanding) on the hybrid TUCE tended to be 
stronger on simple application and complex application questions than on recognition 
and understanding questions, although the differences were not significant (and there 
is one exception). Elliott, Ireland, and Cannon also found the disadvantage of females 
to be greater on application than recognition and understanding questions in a study 
that measured learning during the introductory course. Finally, Allison's account of 
the time pattern of the difference in test performance is in accord with her explanation 
that its source is based in analytical skills differences, 4 since the insignificant gender 
difference in performance by November of the first semester is more iikely to reflect 
achievement on recognition and understanding skills vis-a-vis the commonly greater 
analytical content included in examinations as the course progresses. 5 

Interactions Between Teaching Techniques, Ability, and Sex 
Differences in Learning Economics 

Chizmar, Hiebert, and McCarney ran separate regressions to explain the level of 
economic understanding of students at Illinois State University in a computer-assisted 
instruction class and a class without CAI. They found no gender-related differences in 
the level of economic understanding on the basis of the method of teaching. In 
contrast, Allison (1977) and Siegfried and Strand found that women taking the 
principles-of-economics course at Harvard and Vanderbilt did relatively better in a 
self-paced-instruction (SPI) class than in a mass lecture. This finding is consistent 
with the theory that women respond better to environments that accommodate 
dependence: in an SPI course, tutors are assigned to each student and work closely 
with the individual. However, the evidence is also consistent with Allison's account, 
namely, that the difference in analytical skills is the fundamental source of the gap. 
Drill in analytical materials is generally more intense in SPI courses, thereby focusing 
instruction on the weak a;ea of preparation for most women. 

There are several interesting findings that go beyond the simple average 
male-female difference in performance on economics tests. For example, Sloane 
found that men did better overall using a learning criterion in his principles-of- 
economics course, but the gender difference disappeared for the top and bottom 
quartiles of the class. Apparently it is manifested almost entirely in the middle range 
of the class (ranges determined by pre-TUCE). Sloane also found that women did 
relatively better with the more inexperienced instructor than did men. Allison (1977) 
also analyzed the effect of gender on achievement separately for high- and 
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low-achieving students. She found that the female disadvantage was more pro- 
nounced for the bottom quarter of achieving students. Allison does not report similar 
subregressions for the middle two quarters. Consequently, although her findings do 
not appear to be similar to Sloane's finding regarding a nonlinear effect of 
achievement levels on sex differences in understanding economics, a conclusive 
determination cannot be made. 

Sex Differences in Attitudes toward Economics 

Allison (1976a, 1977) reports that women showed a statistically significantly 
greater level of enjoyment of the principles-of-economics course at Harvard even 
though they understood less economics. In spite of their greater enjoyment and lower 
achievement, there were no apparent differences in work effort related to student 
gender. Kelley (1972) also found that women tend to like the principles-of-economics 
course better, but he found no difference in ratings of the professor on the basis of 
student gender. Wentworth and Lewis and Paden and Moyer, however, report no 
significant male-female differences in attitudes toward economic instruction. 

In his study of the lasting effects of economic in struction , Saunders ( 1 973) found 
greater interest in economics (as self-reported on a questionnaire) among men than 
women for sophomores , sciiors, and alumni who had not had any college economics . 
However, among students who had had an introductory course, there was no 
significant difference in levels of interest between men and women . This suggests that 
interest differe es between the sexes are manifested in course selection rather than 
being an important determinant of variations in performance during the course. 

Tuckman found that males are more likely to continue their economic training by 
taking further courses. Allison (1976a), however, found that gender was useless as a 
pre^jpr of re-enrollment in economics courses* 

^f^ann and Fusfeld found that men had both a greater level of sophistication of 
attitudes toward economics and gained a little more attitude sophistication during the 
course than did women. However, their measure of attitude sophistication has been 
criticized by Rothman and Scott, for, among other reasons, being a measure of 
political liberal values. More recently, Sosin and McConnell examined the effect of 
an introductory macroeconomics course on student attitudes toward the distribution 
of income. They found that, in general, the course led to a statistically significant 
move in attitudes away from conservatism and toward an egalitarian attitude about 
income distribution. They hypothesized that women would experience a greater shift 
in their attitudes toward an egalitarian distribution of income because they might 
anticipate a work-life cycle involving periods of dependence upon intrahousehold 
transfers of income. The empirical results supported their hypothesis. 

Davisson and Bonello, in a large study of computer-assisted instruction at Notre 
Dame, found comparatively little difference between men and women in attitudes 
toward economic institutions, problems, and policies. They asked students to 
classify, for example, government spending deficits, government controls of wages 
and prices, poverty, inflation, increasing the money supply, labor unions, big 
business, market mechanisms, etc., as good or bad, inevitable or controllable, 
effective or ineffective, etc. 

Conclusion 

The scant evidence on learning and understanding economics at the elementary 
school level indicates few differences between the sexes. However, by the high 
school years gaps appear to develop. Differences in understanding seem to persist 



through the college years, but there does not appear to be any widening of the gap. 
Most of the research from which these conclusions have been drawn was designed for 
other purposes, and consequently is not satisfactory enough to resolve the issue 
definitively. 

FOOTNOTES 

1. The primary objective is usually to assess a particular teaching method. See Siegfried and 
Fels for an extensive catalog and critique of these studies. 

2. The puzzle surrounding this argument, however, is that most studies of achievement in 
introductory economics find that verbal SAT rather than quantitative SAT is more important 
in explaining success (see Fels and Siegfried). 

3. In a study of the economic statistics course, Cohn found no significant difference between 
the performance of men and women in his course, although the sample of women was 
relatively small. 

4. In a recent survey of male-female differences in precol lege economics education , Ladd cites 
evidence that there are no differences in analytic ability or concept mastery between men 
and women. Apparently recent work in psychology argues that the commonly perceived 
difference is confined more narrowly to differences in visual spatial skills than general 
analytic ability. Ladd points to differences in quantitative and verbal abilities as a factor that 
explains male-female performance differences. "If economics at the precollege level 
requires more quantitative ability than verbal ability (a debatable issue), however, some 
differential learning ability for economics may exist" (p. 147). However, this is contrary to 
the relatively greater impact of SAT verbal scores than SAT quantitative scores on cognitive 
achievement in introductory economics courses (see Siegfried and Fels). 

5. Another possible explanation for the observed difference between men and women in 
economics achievement is sample self-selection bias. For example, if general tastes for 
learning economics vary systematically by sex, and the appropriate skill distribution for 
men and women is similar, the difference in observed performance could be due to 
comparing the tail (upper, presumably) of one distribution, likely female, with a broader 
cross section of the other distribution, male. Such a possibility would normally lead one to 
expect a bias toward showing that women do better than men. That the studies almost all 
show either no effect or worse performance by women is not necessarily inconsistent with 
this idea. Indeed, this bias could well mean that the true disadvantage of women in learning 
and understanding economics is understated by the reported empirical tests because the tests 
are biased in favor of showing an advantage for women , due to the selectivity of women who 
elect economics . One way of learning more about this process would be to explore the effect 
of interaction variables (between the usual binary variable for gender and some taste 
variable, say major) in the regression analyses. To my knowledge this has never been done. 
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ECONOMIC EDUCATION RESEARCH: ISSUES AND ANSWERS 

Research on Economic Education: 
Is It Asking the Right Questions? 

By Burton A. Weisbrod* 



The division of responsibility between the 
two papers at this session is a fascinating one. 
One author has agreed to examine the ques- 
tions that are being asked in the economics 
education literature, and the other to examine 
the answers! 

As is so often the case, however, the 
underlying assumption of separability does 
not hold. A research question is not a "good" 
or "bad" question independent of the quality 
of the answers it is likely to generate. An 
"exciting" question that is unlikely to yield an 
answer of substantial value is not a good 
question. Research is a production process in 
v-hich something called "useful knowledge" is 
the output. The inputs to this process include 
both the specification of questions that ? * 
important — in the sense that the answers 
would have great expected value — and he 
marshaling of resources (i.e., the incurring of 
costs) to answer the questions. 

If the costs of answering all research ques- 
tions were equal, or were random with respect 
to the significance of the question, then the 
separability of the decisions on question spec- 
ification and on question answering would be 
justified. What we probably confront, how- 
ever, is a less fortuitous set of conditions in 
which the questions that are most valuable to 
answer are also the most costly (i.e., diffi- 
cult), the issue I have been asked to deal 
with — whether research on economic educa- 
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tion is asking the right questions — thus 
involves implicitly the performance of a bene- 
fit-cost analysis on project selection in the 
area of economic education research. An 
evaluation is needed of (a) the expected 
benefits (more precisely, the probability 
distribution of benefits conditional on answers 
of various quality), and (b) the expected costs 
of obtaining answers of each quality. 

It is possible, of course, that a particular 
research question may be a good one in the 
efficiency sense that the expected costs of 
researching it are less than what the expected 
benefits would be //the resources devoted to 
the research were used as productively as 
possible; yet if the resources were not used so 
productively, it might fail the allocative effi- 
ciency test. Thus, a question could be poten- 
tially efficient to research but actually ineffi- 
cient. In any event, the "best" questions to 
research are those for which the excess of the 
value of the expected answers (benefits) over 
the expected costs of the research are maxi- 
mized. Deciding which are the "right" ques- 
tions to research implies a benefit-cost (effi- 
ciency) analysis for the prospective project 
that is essentially the same as for any other 
resource-using project, such as in water 
resources or manpower training. Thus, upon 
careful scrutiny the imaginative effort by the 
organizers of this session to break a 
monstrous evaluation task into two distinct 
evaluations fails to pass the test of separ- 
ability. 

Despite my conclusion regarding the simul- 
taneity of judgments on which are the right 
(best) questions to ask and on the costs and 
quality of expected answers, I shall proceeu. 
In the remainder of this paper, I try to 
identify the native of the research' questions 
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that have been posed in the economic educa- 
tion literature, and the nature of the questions 
that have not been posed. I will comment on 
whether the overall research program— the 
set of questions being asked — is what it 
"should" be, and attempt to point to research- 
able themes that are likely to have relatively 
high returns for research in this field. 

!. Economic Education and General Education 

One basic question is, why study economic 
education at all? What reasons are there to 
believe that the subject matter of economics is 
sufficiently special so that the voluminous 
general literature on teaching and education 
is not applicable to economics? 1 I have not 
seen this question posed in the more than 1 50 
papers I have surveyed in the Journal of 
Economic Education, (JEE) and in the 
annual American Economic Association 
(AEA) sessions on economic education. 2 
(There are, of course, papers on economics 
education published elsewhere, but my survey 
does not extend much beyond these two "offi- 
cial" AEA sources.) My point is simple. Is 
there not a substantial probability — indeed, 
perhaps not a presumption — that researchers 
studying economics education are "rediscov- 
ering the wheel," posing and answering ques- 
tions that have been answered previously in 
the more general research on education? For 
example, is it not likely that the effect on 
"learning" of, say, class size, or of the use of 
teaching assistants rather than more experi- 
enced professors, or of individually self-paced 
approaches rather than a traditional uniform, 
instructor-paced approach, is similar for all 
subjects? I do not assert that the answer is 
obvious and affirmative. I only qujstion 
whether it is a "high priority" research 

'This is not to say that there is nothing special about 
the teaching of economics Economists typically believe, 
for example, that people have i lore misinformation and 
biases concerning economics than about otner subject 
matters. (Mark Schlcsinger pointed this out to me ) Even 
if this is true (see Kenneth Boulding) the question would 
remain whether resources devoted to teaching economics 
should be deployed differently than in other subjects. 

2 For an excellent survey of research on educational 
production functions, see Eric Hanushek. 



matter to devote substantial resources to 
general questions of teaching techniques; 
questions that are not specific to the teaching 
and learning of economics and that have been 
studied extensively in other subject matter 
contexts. It may well be true that, as one 
economist at an AEA session on economic 
education recently put it, "Educational pro- 
duction functions are at least as interesting as 
those for hybrid corn" (see Elisabeth Allison, 
p. 228). Nonetheless, it would not follow that 
production functions for economic education 
are efficier topics for economists* research. 

II. The Production Function 
for Economic Education 

What research has been undertaken in 
economics education? Most of it is devoted to 
exploring some portion of the production 
function for economics education. Of the 159 
papers surveyed, I count 102 — essentially 
two-thirds — dealing either with how to define 
and measure outputs (23 papers), or with the 
effect on output of various alternative inputs 
(79 papers). This production function orienta- 
tion is consistent, however, with the J EE's 
goal as stated inside the front cover: "To 
promote the teaching and learning of econom- 
ics in colleges, junior colleges and high 
schools by sharing knowledge of economic 
education." 

Table I presents the 79 input-output 
oriented papers according to the principal 
type of input the productivity of which was 
being studied, and according to the level of 
schooling. Since each paper was counted only 
once, while some papers touched on more 
than one input or school level, the table is an 
incomplete portrayal of the research foci. 

i have classified the independent variables 
id the production function as capital, labor, 
students themselves, course content, and 
instructional methods (ways of combining 
inputs). An impressive variety of variables 
have been researched. I cannot judge whether 
some inputs that have received little or no 
attention are "worth" studying — for example, 
the output effects of lYr time of day that the 
class is held (but see Rolf Minis), the color of 
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Table !— Number of Articles on Various Production Function Relationships for Economics 
Education, by Type of Input and Level of School 



Type of Input 



Elementary Junior Graduate 
And High School College College School 



N on school 



All 
Levels 



Capital 
Textbooks 
Computers 
Television, slides, etc. 

Labor 
Instructors 
Graduate Assistants 
Consultants 

Students (ability, motivation, 

family background, other students) 
Course Content (subject matter) 
Instructional Methods 

(ways of combining inputs) 

Games and Simulations 

Learning contracts, 
self-paced instruction 
and programmed learning 

Lectures 

Course evaluations 
Length of course 
Class size 

Total 



12 

(15%) 



5 

(6%) 



11 
1 

4 
I 
1 

61 

(77%) 



1 

(1%) 



4 
9 
J_ 

20 (25%) 

II 
5 
_! 

17 (22%) 

5 ( 6%) 
U (14%) 



11 
1 

4 
1 

_\_ 

26 (33%) 
79 

(100%) 



the classroom walls, or the seating arrange- 
ments 

A. Interaction Effects 

What is probably a more serious omission 
is the lack of examination of interaction 
ertects among input variables. It seems likely, 
for example, that a particular type of text- 
book (input IA) when used by graduate 
teaching assistants (IIB) will be more effec- 
tive for low-ability students (///) than they 
would be for high-ability Students. Similarly, 
games and simulations (input VA) may be 
differentially effective depending on whether 
instructors (II A) or graduate assistants (IIB) 
are used and depending on the student's 
intitial level of motivation (III). 

B. Limited Scope 

Another striking aspect of Table 1 is the 
overwhelming emphasis on teaching at the 



college level (77 percent of the papers). The 
JEE goal, stated above, refers to "colleges, 
junior colleges and high schools." The scant 
attention of economics education researchers 
to high schools and junior colleges is notewor- 
thy, given that half of young people do not go 
beyond high school, and that those who do go 
further are increasingly likely ic go to a junior 
college. (Examples of research on economic 
education in junior colleges are Dennis 
Weidenaar and Joe Dodson, and Darrell 
Lewis, Donald Wentworth, ynd Charles 
Orvis. For a precollege focus, see Rendigs 
Fels (1977) and Thomas Duff.) It may or 
may not be true that the production function 
findings for the college population apply also 
to the junior colleges and high schools; the 
issue merits attention. Students' ability and 
motivation levels (as well as the variances in 
those levels) vary across the schooling levels; 
thus, the interactions of these students char- 
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acteristics with other, conventional inputs will 
produce, I hypothesize, different output 
effects depending on the level of school. 

The narrow scope of teaching settings on 
which research has been published is also 
evident from the dearth of attention to the 
production function for teaching economics 
either in graduate schools (see, however, 
W. Lee Hansen and Robert Decker for 
models predicting success in graduate eco- 
nomic studies) or in nonschool settings such 
as in the home via television (see John Cole- 
man) or via popular journalism (magazines 
and newspapers). How "effective," for exam- 
ple, are the syndicated newspaper columns of 
writers such as Sylvia Porter, the Newsweek 
columns by Milton Friedman and Paul Samu- 
elson, the articles in magazines such as Chal- 
lenge or Public Interest, or in daily newspa- 
pers? How effective — and for whom — are the 
efforts of private firms to provide "economic 
education" via newspaper advertisements (for 
example, Mobil Oil on energy issues)? These 
are unanswered — indeed, unasked — ques- 
tions. Yet, the vast majority of people have 
not taken and never will take a formal 
economics course in any school, and they will 
be exposed to economics only through such 
informal media. Thus, the production func- 
tion for learning economics outside traditional 
schools seems to warrant substantial explora- 
tion — assuming, of course, that economics is 
wort* . opportunity cost of learning it. The 
omis..^ . nonschool teaching and learning 
of economics from the JEE statement of 
policy is unfortunate. 

C. Distributional Effects 

I turn next to a related aspect of the 
production function work: the distributional 
effects of alternative course contents, input 
combinations, and instructional materials. 
These have been studied to some extent (for 
example, Richard Attiyeh and Keith Lums- 
den; Hansen, Allen Kclley and the author; 
Fred Thompson); yet, based on the evidence 
from the general literature on education that 
a given approach is likely to have substan- 
tially different effects on different "types" of 



students, this dimension seems to deserve 
more scrutiny. Whatever the mean differen- 
tial may be between the output effects of 
different inputs, examination of the variance 
about the mean may disclose systematic 
differences among students according to char- 
acteristics that are discernible at the outset of 
a course. 

HI. Outputs 
A. Goals of Economic Education 

The body of research presented in Table 1 
focuses on the productivity of various inputs; 
the dependent variable — output — is generally 
taken as given, typically in the form of some 
test score. There is, however substantial other 
literature — not in the input-output frame- 
work — discussing the normative question of 
how output ought to be defined and 
measured. There are papers that discuss the 
"usefulness 1 of a specific output measure, 
particularly the Test of Understanding in 
College Economics (TUCE) (for example, see 
Darrell Lewis and Tor Dahl; Fels, 1977). 
Other concepts and measures of outputs on 
which papers have been published include 
changes in student political attitudes (see 
James Scott and Mitchell Rothman) the 
students' owr. judgment of effectiveness (see 
Kelley); learning "radical" economics (see 
Richard Edwards and Arthur MacEwen; 
John Gurley); and developing problem-solv- 
ing abilities (see Fels, 1973). In addition, the 
durability or permanence of the effects, as 
distinguished from measures of effectiveness 
obtained at completion of the course, has 
received some attention (see Phillip Saunders; 
Saunders and G. L. Bach). 

Overall, however, the question of what 
economics education ought to be aiming at — 
that is, which outputs should be produced — is 
a question that has received little rigorous 
analysis. The question of what kind or kinds 
of "economic education" to produce is a 
difficult one. Should it be idealogically 
oriented? Should it provide whatever 
"buyers" want? Who are the buyers- 
parents? taxpayers? students? Our custom- 
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ary consumer sovereignty model appears to be 
of limited guidance here, given widespread 
consumer ignorance of the importance of 
economic knowledge, and given the external 
benefits from having a population that is more 
sophisticated in its understanding of economic 
processes. In economics education, as in many 
other "professional" markets, buyers are 
poorly informed regarding product quality. 
Even if buyers know their objectives, they 
may know little about the effectiveness of 
particular activities in achieving those objec- 
tives. 

My references to "consumer ignorance" 
and to "external benefits," however, are 
scarcely more than assertions. I have seen 
little research that rigorously confronts the 
question of whether there is a market failure 
in the economic education market, with too 
few people studying too little economics or 
studying the "wrong" economics. The pub- 
lished research either asserts that more 
economics is good— and presumably is better 
than some unspecified alternative uses of 
student time and other resources — or else the 
research asks the narrower production func- 
tion question of how effective one type of 
input is compared to another, without asking 
whether the output is worth producing. In 
volume 1 of the JEE George Stigler (p. 78) 
did pose the question "Why should people be 
economically lit A *2te, rather than musically 
literate, or historically literate, or chemically 
literate?" I will resist the temptation to 
discuss his answer — except to note that musi- 
cians, histor*ans, and chemists may see things 
differently. 

B. Effectiveness vs. Allocative Efficiency 

The domination of a production function 
emphasis in economic education research has 
obscured the related issue of the allocative 
efficiency uf alternative input combinations. 
Many papers have examined the effectiveness 
(productivity) of various inputs, but rarely 
have the relative costs of the inputs been 
juxtaposed to the relative effectiveness, nor 
have the measures of effectiveness been trans- 
lated into values of benefits. These questions 



have seemingly been overlooked or, at least, 
slighted. 

I find it surprising that among the (admit- 
tedly small number of) papers confronting the 
question of how to define the output or 
outputs of economic education, there has been 
so little attention to labor market effects in 
general, and earnings effects in particular. 
The contrast between the economic education 
literature and economics of education litera- 
ture is dramatic. The latter has concentrated, 
typically within a human capital theoretic 
framework, on the relationship between 
education (meaning schooling) and earnings, 
virtually disregarding the process through 
which educational inputs produce the outputs 
that have value in the labor market. Another 
way of saying this is that the economics of 
education literature has viewed earnings as 
the value of outouts. Meanwhile, the 
economic education literature has concen- 
trated heavily on the process of converting 
inputs into outputs in nonpecuniary forms, 
virtually disregarding the valuation of out- 
puts. 

One might have predicted a priori that the 
economic education literature would have 
included numerous efforts to assess the labor 
market value of economics training either 
directly or indirectly through its effect on, 
say, the probability of admission to law 
school. Why the economic education litera- 
ture and the economics of education literature 
have been so divergent, and whether either, or 
both, or neither has pursued an "optimal" 
path are questions which 1 raise here, but will 
not pursue far. 

C Lifetime Effects 

The human capital framework, within 
which much of the economics of education 
literature has been cast, has focused research 
attention on the investment aspect of school- 
ing. The investment emphasis implies a life- 
time perspective on the outputs of schooling. 
By sharp contrast, the economic education 
literature has concentrated overwhelmingly 
on the immediate outputs, those measured at 
the completion of the course. As pointed out 
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above, there have been a few noteworthy 
exceptions in which the durability of outputs 
has been considered, though even these have 
involved a horizon of only a few years or so 
(see Saunders; Saunders and Bach). It may 
well be exceedingly difficult to measure life- 
time effects of exposure to economics, and 
this may explain the lack of attention to this 
question in the literature. (This would illus- 
trate the interrelatedness of the "do-ability" 
of research and the formulation of research 
questions.) But the fact remains that little 
effort has been devoted to the measurement of 
lifetime effects. 

IV. Incentive Structures 

Another underresearched area is the nature 
of incentive structures facing teachers and 
admininstrators. Assume that 1) the pro- 
duction function research disclosed that 
certain inputs are more effective than others, 

2) consensus was reached on appropriate 
measures of outputs (i.e., effectiveness), and 

3) outputs and inputs were valued and showed 
positive net benefits from a change in current 
teaching practices. Would the changes occur? 
Are there incentives sufficient to encourage 
changes that are efficient (granting that such 
changes can be identified with reasonable 
confidence)? 

These questions, it might be argued, tran- 
scend economic education. It would seem, 
however, that the responsiveness of teachers 
and administrators of economic education 
programs may or may not be the same as for 
those in noneconomics areas; at least this 
hypothesis cannot be ruled out, any more than 
can the hypothesis that variation in class size, 
or in the effectiveness of teaching assistants, 
or the use of television instruction differs as 
between economics and other subject areas. 

1 iie nature of incentives confronting teach- 
ers — of economics or of anything else, and at 
various levels of schooling — has received 
scant attention. There are possible incentives 
for instructors (a) to learn which changes are 
efficient, and (b) to make those changes. (On 
the latter point, studies of salary determina- 
tion at universities, see, for example, John 



Siegfried and Kenneth White. James Koch 
and John Chizmar have shed some light on 
the financial returns to scholarly research, 
teaching, and other uses of faculty time.) It is 
arguable that little is to be gained from 
research on how to "improve" teaching if the 
incentives to adopt improved methods are 
weak. It is also arguable, on the other hand, 
that incentives are weaker than they might be 
because there is so little agreement as to what 
constitutes efficient teaching; this, after all, 
involves the specification of goals in opera- 
tional terms and the adoption of value weights 
for the multiple goals that surely exist. Thus, 
understanding goals and weights is one part 
of the research agenda for efficient innovation 
in education. 

In any analysis of incentives in education 
the relationship between private costs and 
social costs (or returns) is likely to be crucial. 
As an illustration, consider the case of an 
economics instructor who is free (although 
many are not) to select any undergraduate 
textbook, and that a new textbook appears on 
the market. There may well be little incentive 
(financial, professional, or any other) to read 
the new textbook carefully enough to deter- 
mine whether it is superior to the one being 
used; this, however, is not my principal point. 
What if the instructor knew — costlessly and 
with certainty — that the new book was "more 
effective" for all of his students; what would 
be the private and social costs and benefits of 
adopting the new book? Of course, more 
effective need not imply "more efficient." 

From the students' viewpoint, the new book 
would presumably be preferred if it were 
more effective. Such a preference, in turn, 
embodies two deeper assumptions: (a) the 
similarity of student goals and of faculty goals 
for students, and (b) the absence of higher 
costs (time, effort, money) for using the new 
book that offset the benefits of increased 
learning. 

Note, however, that while the student must 
incur the cost of reading whatever textbook is 
chosen — an essentially fixed cost — the fac- 
ulty person bears an increased real social cost 
of changing, since he or she has lecture notes 
keyed to a textbook that has already been 
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read. With the benefits of change accruing to 
students while the costs are borne by faculty, 
the likelihood of market failure is substan- 
tial. 

The market failure would disappear, 
however, if the instructor internalized the 
students' benefits. This might appear to be 
the case if the instructor acted as an idealized 
"professional" — that is, acted as the consum- 
er's agent for maximizing the consumer's 
(student's) utility. Education is an example of 
a commodity — like medical care and legal 
representation — in which consumers are 
aware of their inability to judge quality, and 
so they place trust in the professional to act in 
their best interest. Even if the instructor were 
to behave, however, so as to maximize not his 
or her own utility but that of students (or 
parents, or taxpayers), it would not follow 
that efficient resource allocation would result. 
The reason is that the cost of switching 
textbooks (or, in general, of changing 
anything in the teaching process) is a real 
cost; if it were to be disregarded — as would be 
the case if the instructor were to act so as to 
maximize the consumer's utility — the result 
would be excessive change. 

The market failure would also disappear if 
the reward structure were such that the 
instructor's pay were an appropriate function 
of the "value-added." Then, if students 
learned more from the new text, the instruc- 
tor — acting in self-interest — would weigh the 
costs of changing books against the benefit, 
and would choose accordingly. Ideally, the 
rewards would be commensurate with the 
student benefits, and so — assuming away real 
external effects and other market imperfec- 
tions—the instructor would be confronted 
with the real costs and benefits of change. The 
problems of developing such a reward system 
arc doubtless great. It docs not follow, howev- 
er, that they are not rescarchable. 

These remarks have been abstract; meat 
must be put on the analytic bones. I hope that 
the next time the economic education litera- 
ture is surveyed there will be found more 
papers exploring incentives for innovation and 
efficiency — in both positive and normative 
dimensions. 



V. Concluding Remarks 

It is all too simple to find questions that one 
would like other researchers to tackle, as I 
have done here. Thus, I should close by 
reiterating my claim made at the outset that 
the selection of optimal research questions is, 
in principle, a matter of weighing benefits and 
costs, of comparing the value of having 
answers to the costs of obtaining them. If the 
costs are sufficiently high, it would be ineffi- 
cient to research questions that seem impor- 
tant. Some of the questions to which I have 
pointed probably fail such a benefit-cost effi- 
ciency test, and so have received, quite wisely, 
little research attention; other questions, 
however, may pass it — at least for some 
researchers — and so merit more study. Once 
more we can conclude that "more research is 
needed." 
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Where is the Economics in Economic 
Education? 
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Economists can be distinguisned :rom other social scientists not so much by the topics they 
investigate as by the method (that is, the suppositions and theoretical tools) they employ in their 
investigations. Support for this position can be traced through economic literature, although it 
will suffice to note that Keynes defined economics as **a method rather than a doctrine, an 
apparatus of the mind, a technique of thinking, which helps its possessor to draw correct 
conclusions." Although the precise scope of this method is debatable, most economists agree 
that the choice calculus of the individual, as opposed to the group, is at the heart of the 
economist's view of the world. 

Because the learning process for both student and teacher involves choices which are 
basically similar to those which they confront in their other pursuits, the tools of the economist 
have a clear and direct application to research in economic education. To date, however, 
economic education has been predominantly studied with the statistical tools and learning 
theories of professional educators. Given an almost total neglect of the conventional, economic 
approach in the subfield of economic education, one cannot avoid asking: Where is the 
economics in economic education? Do economists not have something to contribute, as 
economists, to the study of how they teach and what they teach? 

In this paper I attempt to examine what economic educators are and are not doing. The 
focus is not on the education process, per se 9 but rather on how economic educators have 
studied the process and how I believe they should study it— as economists. In this context, the 
paper is openly critical of the status quo, more for what has been neglected than for what has 
been done, and it attempts to be persuasive to the extent that suggestions for redirecting 
economic education research are implied. Admittedly, the nature of the task I have set for 
myself requires that I draw generalizations concerning the work of an economic education 
profe sion which has many diverse elements; and space limitations dictate that my 
genei/Jizations be more general and less embellished with qualifications than I would like. But 
I proceed on the assumption that we all share common interests and seek answers to many of the 
same questions. In the final analysis, my comments are valid to the extent that we share the 
same point of reference, and for this reason I begin within a preliminary statement of what I 
perceive that point of reference to be. If we disagree on the reference point, we will surely be at 
odds over what we as economic educators should do. 

Richard B. McKenzie is Associate Professor of Economics. Clemson University. The paper was 
written when the author was at the Center for the Study of Public Choice, Virginia Polytechnic Institute and 
State University. He is indebted to his colleagues there* Robert Staaf, Nicolas Tideman and James 
Buchanan, for helpful comments and suggestions. 

SOURCE: Journal of Economic Education, vol. 9, no. 1, Fall 1977, pp. 5-13. Reprinted with permission of the 
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The Economic Approach and a Critique of Economic Education 

If economics ha> any claim to being a science, it is to the extent that it has a positive as 
opposed to a normative tradition The central concern ot the economist has traditionally been 
with what is. and not with what ought to be. The individual, so long the cornerstone of 
economic theory, has been presumed to choose in accordance with his own preferences, no 
obligator) v alues or behav lor patterns hav e been imposed upon him or presupposed tor him. He 
is a free agent, operating within physical, genetic and social constraints Economic theory in 
this sense is nonnormative, general and applicable where freedom of choice exists. It has 
explanatory and predictive power precisely because of. and not in spite of. assumed 
individuality. And it is virtually axiomatic within the profession that to know the individual is to 
understand the mechanism through which social phenomena (including education) occur, to 
bypass consideration of the individual choice calculus runs the risks inherent within the 
hit-or niiss, ad hoc theorizing so common in other disciplines 1 

Economic education, per .se, has yet to be brought squarely within the positive tradition 
Granted, many economists have made empirical studies of the education process as it pertains 
to economics, and with a reasonably broad conception of the discipline, such studies may be 
deemed economic in nature. Obviously, a major concern has been to improve the ethciency ot 
the learning experience, which many think constitutes an economic problem, one which has 
been seriously ignored in the past. However, the theoretical models and assumptions 
undergirding many ot these studies raise issues which must becaretully cvammedand which 
are far more complex than one may suppose Let me briefly outline the nature ot those issues in 
the hope that others will consider them. 

The Problem of the Underlying Mode! 

It seems tair to say that most studies conducted in economic education have been based on 
what may be called the "educational model" of student and faculty behavior This model, 
which is used extensively in education, psychology and sociology presumes that student and 
taciilty behavior is detenmned by genetics and the many environmental forces (social and 
physical) which are present in the educational setting and that these forces can be so arranged 
that student and taciilty behavior is "shaped" to achieve the desired responses Using this 
model, economic education has been treated as a mechanical process or as an input output 
problem appro u mating that of molding a vase from clay The mdiv ulual (the student), who has 
in a stnse been assigned the passive role of the clay, has been devoid of any hint ot the 
rationality normally associated with homo uonomuus and has. perhaps, been lost behind the 
computer and in the discussion ot the performance of the "group" or "class " Without 
explicitly providing for choice, and by viewing student and faculty behavior as that which is 
revealed in data collected after the learning experience has taken place, the assumption 
implicitly ard otten made is that the independent variables entered on the right-hand side of 
regression equations work directly on the dependent variable on the left hand side— that the 
individual students nspond* although imperfectly, to the punching, pulling and molding of the 
master craftsman (the instructor). Such a casting of roles, no doubt, elevates the importance ot 
the economic educator. 

Use of the education model ot student and faculty behavior needs to be seriously 
questioned tor several reasons. First, it prov ides educational tesearchers with a "mind Net" 
which views the learning experience and its outcome, not as that which cmeiges from the 
interaction of independently motivated individuals with the capacity to act upon (as opposed to 
respond to) social forces, but rather as an experience which must be controlled to obtain an 
objective which is largely derived externally from the learning experience and is imposed on i! 
That which is defined as "good" or "efficient" is not necessarily that which is defined to be 
"good* or "efficient" by the participants in the learning experience. Contrast this perspective 
with the economist * vision ot the market in which the values of the individual market 
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participants are the criteria for determining the efficiency of the market process. When two 
people trac'c their wares, the exchange is "efficient" in the sense that a welfare improvement 
has occurred because the market participants define the exchange to be good. They enter into 
the exchange and benefit from it, not because the market environment has been molded so that 
they will enter into it, but because they perceive the exchange to be beneficial. 

The economist's view of the emerging market process is an argument against control, not 
for it. On the other hand, the perspective of the education model, with its externally derived 
efficiency criteria is an argument for control and, to some extent, a denial of the values of the 
mdiv idual participants. To the extent that any mind-set facilitates thinking along certain lines, it 
reduces the cost of drawing certain types of conclusions and increases the frequency with which 
those conclusion, are deduced. The conclusion so easily drawn, using the perspective of the 
education model , is that we need to specify our goals more carefully and determine with greater 
precision our standards of performance. Using the education model, it seems likely that 
researchers will be more interested in seeking out "common goals" for the economic education 
process and in extending controls over that process than will researchers who are imbued with 
the conventional economic model in which the reference point is the value set of the individual 
educational participants and the "goodness" of the process is defined by the mutually 
beneficial trades which occur between participants. 

The reader may be concerned at this point. I agree that there is also a problem in viewing 
the learning process completely as a market phenomenon. It is clear that young children 
required guidance from their parents. They learn what to do and what not to do, what is ' 'good' ' 
and what is ''bad," from parental values which implicitly determine the deliberately imposed 
constraints on their behav lor. However, notice that the goals which are established in rearing a 
child at home are set at a highly decentralized level and that they v ary markedly from household 
to household and even more markedly from neighborhood to neighborhood. The educational 
model may therefore be quite applicable to the problem,, of parents, and it tan justifiably be 
extended to the elementary school and applied to the teaching of certain subjects which are 
acceptable to (practically ) everyone. Unanimous agreement on collective issues, such as goals 
tor education, is the public counterpart of mutually beneficial trades in a market setting; it 
insures that only Pareto efficient collective decisions are made The general agreement on the 
goals ot certain kinds of primary education, taught in certain ways, insures that the outcome is 
desirable My concern is whether or not we can take the argument that is used for the application 
ot the education model to home-produced education and to elements of primary education and 
apply it to higher education and subjects like economics over which there is broad disagreement 
as to what it is. what it should be, and what should be done with a. These extended applications 
of the education model involve very large and diverse groups of people, and the so-called 
'common goaK" require that collective as opposed to individual decisions be made within the 
relevant group 

Using the education model as their frame of reference, economic educators are 
understandably interested inuUmingthe goals of economic education ; l .nd of specifying in very 
detailed wa>s how the attainment of those goals w ill be measured My concern is whether or not 
the goals of eamonm education (as opposed to goals established at highly decentralized levels) 
and whether or not the means of measuring the degree to which those goals are achieved are 
legitimate problems when considered in the context of a national economic education 
movement It seems to me that the mind-set which one carries with him in his educational 
re^ -h will be quite influential in determining how he will react to what I have just said The 
•:,md-set ot the education model will lead the researcher to the establishment of goals, since 
g.nils are necessary for the development of further educational research The mind-set of the 
economic model, on the other hand, will lead the researcher to question any collectively 
established goals tor economic education, economic education is not that which it should be. in 
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some collectively defined sense. Rather, it is that which emerges from the educational process, 
and that which emerges continuously evolves in many different ways. Indeed, academic 
freedom is a property right which we all enjoy and which recognizes the existence and need to 
promote diversity in the learning experience. This seems to be the spirit in which Jacob Viner 
defined economics as that which economists do. 

Later, I will argue that, if one accepts the "citizenship argument for economic 
education/* consistency mandates that one be concerned with common goals. 2 Having 
established common goals, the relevant research model is the education model. The 
economist's model of learning as an emerging process which is dependent upon individual 
values must be rejected. I will also argue that the citizenship argument is, from an economic 
perspective, a questionable basis for promulgating economic education. Again, these 
arguments relate back to the model that is used and attack the foundations of the research which 
most of us undertake. The irony of this line of analysis is self-evident. As economists we teach 
about the market process and use elaborate models to discuss that process; if we start our 
research with externally derived goals, however, we cannot use the models which we develop 
in class to evaluate that which we do in class. 

Setting this line of argument aside for the moment, other problems with the education 
model can be noted. We have indicated that use of the education model presupposes some set of 
goals and a means of measuring achievement of these goals. However, a;> my colleague Robert 
Staaf has pointed out, goals and testing techniques cannot be expected to fall from heaven like 
manna [20]. Rather, goals and testing techniques must be established through some collective 
decision process. The goals which do emerge from the collective decision process will be 
dependent on the voting rule that is used in the process. If the goals established by collective 
decisions vary with the voting rule which is employed, then there is reason to question that the 
goals which are established, given the voting rule, are meaningful. Using the public choice 
perspective, Staaf goes on to question whether or not . Torts to determine what is good teaching 
in economics based, for example, on a simple majority voting rule, are very meaningful In 
questioning the collective basis for determining the voting rule, he also questions whether the 
educational model is very useful. Even if one rejects Staaf s conclusion, his paper serves the 
important purpose of suggesting that economic educators must not only be concerned with 
goals and test instruments, but also with the means by which those goals and tests ?re 
established, that is, the collective decision-making process. 

If the goals for economic education are established before research in economic education 
begins, it is all too tempting for researchers to conclude that the establishment of benefits from 
economic education justifies the establishment of programs in economic education. This is, in 
essence, the mistake which Milton Spencer makes when in his introductory textbook he writes 
to the student: "But in any case, whatever career you pursue, a knowledge of economics will 
make you a farmore effective citizen, and this alone justifies the time devoted to its study" [ 18 , 
p. 5] . Using the education model as the point of reference, it is easy to understand why the costs 
and benefits, and p; ticularly the costs, of economic education are seen so seldom in economic 
education literature and why estimates of their values are so rarely introduced as arguments in 
regression analysis. In class we are constantly concerned with questions of whether or not 
government regulation, antitrust actions, tariffs, and criminal justice systems are worth their 
cost. Is it not reasonable — if only for the sake *f consistency — for us to ask the same type of 
questions about what we teach? Shouldn't the opportunity value of the student and instructor's 
time be considered in any decision about making economics a graduation requirement for high 
school or college or in any attempt to employ, for example, self-paced learning methods or 
criterion reference grading procedures or in evaluating how much students learn? 

Several economists have begun to consider the choice calculus of individual students and 
faculty indetermining educational outcomes. In this regard, Robert Staaf in an important article 
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in the Journal of Economic Education has led the way [l9] Whereas many economic 
educators had begun to wonder if anything will work in upg.ading economic literacy, Staaf 
demonstrates with elementary tools of analysis that many of the techniques which have been 
used in the classroom on an experimenta basis may in fact be working However, by not 
considering student choice, we may have simply failed to make our studies sufficiently general 
and have failed to view the learning process as a problem in time allocation for the student The 
student may transfer the efficiency benefits acquired in ont :ourse to his study of other subjects 
and to other noi. ducational activ ities. I suggested in a paper written subsequent to Staafs that 
the professor ma> do the same thing, that is, he may transfer efficiency benefits to other courses, 
research and leisure activities [l2]. The conclusion which may be drawn is that, even if a 
change in educational methods results in greater efficiency in the learning process, the student 
and faculty may disperse the efficiency benefits of any change over so many activities that it is 
impossible for present statistical techniques actually to show that efficiency benefits have been 
achieved. More recently, Allen Kelley [9] and Paul Kipps [ 10] have also made contributions 
in the application of economic models to economic education. 

Finally, the education model can be criticized on the grounds that really is not a model of 
behavior in the sense that it yields refutable hypotheses. At its base, it assumes that student 
learning is a function of the numerous forces which come to bear on the stident in the 
classroom. There is, therefore, no a priori basis for concluding that any one of these forces is 
either less or more significant than any other, nor is there any a prion basis for determining the 
directional influence of any force. The directional influence of any variable is simply that which 
is established in the regression equations. Statistical procedures aie used to develop the 
"model" as opposed to testing one that is sufficiently well-defined to yield refutable 
predictions. Whe" m unlimited number of "hypotheses" can be deduced from any theoretical 
frame ot reference, we must question whether or not any one "hypothesis" is anything more 
than conjecture. Jerome Katz reminds us. "A hypothesis that fails to conform to the known 
facts is simply fake, but a statement that fails to assert something beyond this is not a hypothesis 
at all but meicly a report of past experience." It seems to mc that in economic education we 
have predominantly been reporting our past experiences. 

The "Citizenship Argument" for Economic Education 

One of the most widely nsed arguments employed to encourage public and charitable 
support of economic education is that the study of economics will contribute to "improved 
citizenship" or a "better citizenry. ' By this it is generally meant that the coursework will in 
some way rai^e students' awareness of the political, economic and social system of which they 
are a part, increase the intelligence wu.. *hich they vote, change attitudes, and increase 
people's participation in the political process. 

The citizenship argument has been articulated in a number of sources, but space 
limitations nrevent one from showing the variety of ways the argument is used [See, for 
example, 3 and 8.] I can only stress that in all its varied forms, the citizenship argument implies 
a strong presumption that economics will affect students' political-citizenship ability and 
behavior. In its elementary form the argument is correct in the sense that it is -internally 
consistent. // students take economics. if they learn the subject matter and retain what they 
learn, and i/ they employ what they know in their political-citizenship behavior, the result can 
be an improvement— as lefined by economists— in the voting decisions of the public and a 
possible increase in the efficiency of government. The general welfare, as the argument goes, 
can be increased. 

The citizenship argument must be questioned on several counts, and I have undertaken 
that task in other papers [ 13- 15] . Therefore, my comments will be brief here. 3 The citizenship 
argument suggests that throu^n transforming student-citizen behavior, economic education 
will i /oduce a public good; and, indeed, if all act on the basis on the economic principles (hey 
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^j!> , this can be the case Herein lies the economic argument for public support of economic 
education However, we must recognize that the argument is dependent upon group behavior 
and an assumption of group rationality. In class we teach that lational defense and a host of 
other public goods must be produced publicly because individual citizens, who may 
recognize the benefits of the public goods, will have no incentive to provide them on a voluntary 
basis A person may become a free rider because he knows that his individual contribution to 
the public good will have an insignificant effect on the benefits which he, himself, receives. 
Further, voluntary collective action to produce public goods is not likely to occur because of the 
extensive transaction costs involved in trying to bring about collective behavior from a large 
number of people who have an incentive to free-ride on what others may do. 

Does this argument also apply to the public good of economic education? The public 
choice theory developed primarily by James Buchanan [2] , Anthony Downs [5J , and Gordon 
Tullock [2 1 J suggests that it does They argue that because the individual citizen has 
essentially no effect on the outcome of the political process — that is, the t>pe of public policies 
that he wants — he has no incentive to incur the cost of becoming politically intelligent and using 
the intelligence which he has acquired in his political behavior. Unfortunately ''intelligent" 
political decisions by the electorate constitute a very special kind of public good. If those 
decisions are to be free, then they must be made by individuals acting independently of one 
another The> cannot be produced like national defense; rather each individual must be free to 
determine voluntarily what his contribution to the public good will be. 

Through forced instruction and artificial incentives like grades given for classroom work, 
we mav be able to overcome the free rider tendency to learn the type of economics designed to 
improve citizenship. However, we must recognise that there is a wide gulf between the 
classroom experience and the voting booth. Before the public benefits of economic education 
can be acquired through the political process, the student, upon leaving class, must have 
sufficient incentive to maintain the human capital stock that he has acquired in class. 
Furthermore, after he has left the classroom he must have sufficient incentive to employ that 
which he has learned in determining the political positions of candidates and the possible 
consequences of the alternative government policies that are proposed. Given the fact that most 
of us teach methods, as opposed to settled conclusions, the cost of retaining and using economic 
education in the political arena can be quite significant. Public choice theory predicts that the 
typical individual student-citizens will not have sufficient incentive to incur these costs. At the 
very least, public choice theory offers economic educators an hypothesis to refute and 
challenges them to refute it The ev idence which is based on a study I undertook during the 1 974 
election supports the public choice hypothesis [ 13]. Support for the public choice hypothesis, 
which actually applies to a number of courses of study, is also found in the political science 
literature (6 and 1 1 J 

Finally, economic education must overcome the tendency of peop,e, in spite of what they 
know about the economic merits of legislation, to vote their own priv'- interests. I doubt very 
seriously that there are many textile executives trained in economics who will vote against 
tariffs on imported textiles. Indeed, the people who are most likely to remember their 
economics instruction are those who have an interest in manipulating government policy to 
serv e their own ends. I suggest we look more carefully into how economic education is used by 
political entrepreneurs. 

Regardless, the citizenship argument poses a real dilemma for economic education. If the 
ultimate objective of economic education is to produce a "better citizenry/* then we must ask 
what a better citizenry is and how the better citizenry can be produced. This requires that we 
seek common goals and procedures. If we do not address such questions and allow the 
economics profession to proceed in the development of the discipline in many directions, none 
of which is defined to be superior to the others, then we effectively do not have a reference point 



9 

ERJC 



[7«] 



84 



trom w hieh to judge w hether or not economic education has contributed an> thing to ' ' improv ed 
citizenship. " There is no means for establishing that a public good, as opposed to a public bad. 
has been produced, there is no wa> to establish whether or not the benefit" .f the public good 
that is produced are worth the cost. This is because in an unconstrained environment, one in 
which no common objective has been established. an> and everything counts However, if we 
agree that common goals must be established, then we immediately confront the problems of 
collective decision-making within the economic education system. It is not clear that the goals 
decided by collective decision are necessarily superior to indiv iduall> established goals which 
are at variance with the collective!) established goals and procedures Collectively established 
goals may reflect the views of the median voter group within the profession, but that does not 
necessarily make these goals superior to the goals of others Again, if all economists were in 
perf ect agreement as to what economic education should do. and if economists did not have 
private interests in establishing goals, the problems of collective decision making would not 
arise However. I question whether we are as similar in our outlooks and as disinterested as 
many seem to suppose 

Summary 

Individual economic educators have made some interesting and useful changes in their 
classrooms. However, the purpose of this paper has been to raise issues which have largely 
beer; avoided in the past and cannot be avoided in the future. 1 believe that we must critically 
reexamine the presuppositions of much of our work and make use of research being done bv 
economists outside the economic education profession per se Economic educators should 
make greater use of the tool sot analysis which the> tea^h in the stud) of what they teach Tod> 
this, they must step back, away from the classroom and the particulars of their teaeV.ng 
experiences, and consider the student and the teacher as rational individuals who look upon 
economic education as one of manv possible goods over which they must allocate their 
resources to maximize their individual welfares. 

We must ask some tough questions Do voters really have sufficient incentive to become 
and remain economically literate and to employ what they know in the political process*' 
Economics is one of a number of subjects which have potential external benefits Assuming that 
students cannot learn everything, what claim does economic education have on public 
resources relative to other subjects? Can we argue tor the introduction of economic education 
without opening up the public school curriculum to exploitation by all interest groups who 
believe that their subjects rightfully have greater claim to resources than economies'' I note that 
the elementary school curriculum has been opened up to special educational interest groups, 
and the curriculum has literally become a hodgepodge of subjects This development is taking 
its toll on basic educational skills. Finally, we must ask. Is the cost of raising the public's 
economic literacy . as defined, for example, by the TUCE. greater than, equal to. or less than 
the benefits achieved in government policy because of economics instruction ' Although we 
may wish (or believe) that the benefits exceed the costs, this indeed may not be the case, 
particularly in the light of the problems the profession has encountered in raising the 
understanding of students. These are positive questions which go to the heart of what many 
perceive to be the justification for our existence. Answers may not only tell us what we are 
doing wrong but may also indicate why we have not done more 

Footnotes 

'All that 1 have said so far is not meant to suggest that there are no normative elements associated with 
economic theory Coats 14). Nabers ( 16). Rothcnbcrg 1 17). and Heyne (7; have all alerted us to the 
implicit values embedded m the economic method, and BoulJing 1 1 ) has reminded us that the acceptance 
of the economic method as a mode of inquiry involves the acceptance of what he calls the "economic 
ethic," that is. the principle that the "goodness" or "badness" of any social event should he fuunded 
upon an evaluation of its rewards and costs as perceived by the relevant individuals 
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'Succinctly stated, the "citizenship arguments" presupposes that people need economic education to make 
them more intelligent voters and, therefore, better citizens. 

3 In these other papers, I have dealt [ 13] with counter-arguments to the public choice theory about to be 
briefly described and [14] with the design of courses for rational students. 
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Student Performance and Changes in 
Learning Technology in 
Required Courses 

Robert J. Staaf 



Considerable controversy and confusion exists in this journal concerning the effective- 
ness of various input variables such as different Caching methods, textbooks and 
class size on student performance in introductory economics courses. In summary, 
the evidence suggests that these variables have an insignificant effect on student perform- 
ance or that the available evidence is not conclusive. Possibly as a result of these incon- 
clusive data and financial constraints, departments in many universities are adopting a 
policy of increased student/teacher ratios (large auditorium lectures) for "required 
introductory" courses. 

This article develops a model which offers a new approach to analyzing student 
performance and may provide insight into measuring the impacts of a change in the input 
variables on the learning process. It is interesting to note that most of the changes or 
innovations have occurred in introductory courses which for many students are "distribu- 
tive requirements." The analysis suggests that for many students these course require- 
ments may be "inferior goods " thereby leading to unexpected behavior as a result of a 
chc- je in the input variables. 



Robert /. Staaf is Post-Doctoral Fellow and Visiting Lecturer, Center for Public Choice/ De- 
partment of Economics, Virginia Polytechnic Institute and State University. 
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Consider a hypothetical student, Albert, who is assumed to have average or moder- 
ate aptitudes in all fields of study. Or alternatively, Albert has no comparative advantage 
or disadvantage in any particular discipline. Assume Albert is a full-time student, earn- 
ing some minimum number of semester hours (e.g., 4 courses or 12 semester hours).' 
Albert's moderate or average aptitudes are illustrated by Figure I. 2 

Assume the vertical axis represents knowledge in field (a) (e g., social science, and 
the horizontal axis represents knowledge in field (b) (e.g., natural science). For exposi- 
tor) reasons assume this knowledge can be measured and ranked objectively where 
represents greater knowledge than a 3 and so on. J Suppose Albert has an equally divided 
course load between (a) and (b). 4 Assume within Albert's time constraint (i.e., semester) 
and given his aptitudes and present achievements (ao, b 0 at the origin) he is able to achieve 
any bundle of knowledge (combination of a and b) on the line ranging from x t to Xs. 
Under these assumptions Albert's achievement at the end of the semester may range from 
2u and bo to b4 and ao or any linear combination of (a) and (b) such as a 3 , br, a:, b:; 
a i% bj/ These assumptions allow us to specify the attainable set and the trade-off possi- 
bilities between (a) and (b). A change in either Albert's aptitudes or time constraint 
increases or decreases the attainable set. Achievement on the boundary is guaranteed if 
we assume Albert aspires for (a) and/or (b). 6 The particular bundle that Albert selects 
will be a function of his relative preferences for (a) and (b). 7 Note that the bundle selected 
reflects Albert's allocation of his time (semester) between the fields (a) and (b). 

Any bundle on the boundary of the attainable set may be thought of as being similar 
to Becker's full income approach (2, pp. 497-498]. Full achievement of any bundle (on 
the boundary) is attained by Albert devoting all his time and other resources at his com- 
mand to learning activities with no regard for consumption or leisure activities. As with 
Beckers full income model, not all the student's time would usually be spent in studying 
and attending lectures. Time expenditures for sleep, eating and even some leisure are 
presumably required for efficiency in learning. 

Now consider grades that reflect relative achievement levels in terms of (a) and (b). 
If we assume that a4 or b4 represents an "A"; a 3 or bi a "B"; a: or b: a ''C; ai or bi a 
"D" and ao or bo an U F" then the boundary of the attainable set represents a 44 C" average 
regardless how Albert allocates his time. 8 

This highly restrictive model reveals that a minimum G.P.A. of C can be attained 
by Albert allocating his time among a number of alternative achievement bundles (i e, 

The analysis is not changed by assuming a part-time student. 

'Aptitudes are defined as learning rates Aptitudes are not assumed to involve a capacity or stock 
concept. 

^Criterion reference testing may approximate this idea Knowledge is assumed to be broken into 
"bits" of information for which one has knowledge or does not have knowledge This criterion is 
void of norms (e.g., a class or national norm). 

4 Th:s assumption is necessary since we are using a two-dimensional graph The analysis can easily 
be extended to n dimensions. 

<A constant ability or aptitude rate of substitution is only one possibility See (I) for an analysis oi 
decreasing and increasing rates. While the ability rate of substitution may not be constant, there 
appears to be evidence that suggests it is negative. 

^Aspirations may be thought of as the dominance or nonsatiation axiom used in axiomatic choice 
models. Again, see [Ij for a fuller account of this notion, as will be noted later by imposing grades 
over (a) and (b). the boundary is guaranteed by the assumptions of Albert's moderate aptitude and 
preference for college and a degree over not being in college and no degree. 
'More specifically his personal rate of substitution. The ailocrMon of his time and, therefore, his 
achievement in (a) and (b) is determined when his personal rate of substitution equals his ability rate 
of substitution. 

•It is unlikely given graaing practices based on a "normal curve" that the attainable set boundary is 
linear; rather it is more likely to be convex. Therefore, a student strategy of maximizing G P A. will 
not lead !o devoting all his time to one field, 
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Xi, x 2 > Xu X4> X5). Absolute specialization in either field (x } or x<) is unlikely under the 
conditions that Albert is required to take both (a) and (b) courses in a semester. A corner 
solution (all Fs in one field) would mean Alb* n having to earn additional credits towards 
the degree. These additional credits would entail further time expenditures. Therefore, 
for practical purposes we can ignore x { and \$ as being deliberately chosen. The above 
model assumes Albert is endowed with moderate or average aptitudes. If Albert were 
endowed with high aptitudes in all areas, he would be able to attain an A average without 
having to sacrifice A achievement in one field at the expense of A achievement in another 
field. The bundle x*on Figure I represents the well-endowed student. A trade-off curve 
paralleling could be drawn that intersects x*. However, x*is a determinant solution 
if this student is a (G.P.A.) maximizer. Achievement beyond du (e.g., + a) comes at the 
expense of a lower achievement in b which is less than b4. The bundle x*(a4,b 4 ) repre- 
sents a straight A average while any other bundle on the boundary (dotted line) intersect- 
ing x* would represent a lower G.P.A. (e.g., eu + a and b* - a does not yield an A aver- 
age). The marginal product in terms of "grades" (not achievement) for the well-endowed 
student is negative for a reallocation of time that deviates from x*. Between these two 
extreme cases of moderate aptitudes and exceptional aptitudes we might expect some 
trade-offs. In addition, even the exceptional students who have accumulated enough high 
grades in past semesters may to some extent trade potentially high grades for lesser 
grades in the current semester and more leisure and consumption time. Some students 
may have a high aptitude or above average (e.g., B) aptitude in one area and a below 
average (e.g., D) aptitude in another area. In the absence of course requirements, a stu- 
dent may be assumed not to choose any courses for which his aptitudes are low unless his 
preferences are strong for this field and he is willing to pay the price of a lower G.P.A. 

For the moderately endowed student, a trading off of low grades (e.g., D's) in one 
area for high grades (B's) in another area may be quite rational. For example, courses 
(a) and (b) in Figure I may be interpreted as "major" course requirements and "dis- 
tributive" :ourse requirements, respectively. Since the choice of a degree bandle is in a 
large pan a preference for a particular major, and distributive requirements are beyond 




(a) 



(b) 
Figure I 
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the control of the student, one would expect that the major course requirements are pre- 
ferred over distributive course requirements. ° ; milarly we can assume that graduate 
schools and most prospective employers would weigh performance in the major field Over 
performance in the distributive courses. 

David Riesman provides some evidence that tends to support a trade-off policy even 
for those students who are well endowed (e.g., x*) [3, p. 42]. After trying in vain to per- 
suade some of his students that they could think less about grades and more about educa- 
tion, Riesman says that one of his graduate students at Chicago finally did a thesis which 
documented his arguments. The student asked departments which graduates they had 
recommended for jobs, or advanced training or fellowships and then interviewed the 
students and looked at their grades. 

"He concluded that those students frequently fared best who were not too obedient, 
who did not get an undiluted, uncomplicated, straight-A record. (The straight-A 
students, in fact, sometimes slipped away without anyone's noticing.) 

The students who were most successful were a bit rebellious, a bit offbeat, 
though not entirely *goof-offs'; these were the students likely to appeal to a faculty 
member who had not entirely repressed a rebelliousness of his own that had led him 
to be a teacher in the first place, a faculty member who was looking for signs of life, 
even if they gave him a bit of trouble at times. To be sure, such a student had to do 
well in something to earn this response, but he was often better off to have written a 
brilliant paper or two than to have divided his time, as an investment banker his 
money, among a variety of subjects." 

The model suggests that siudents may not exert very much effort in some classes 
even in spite of the fact that the faculty has the power to give low grades which affect a 
student's chance of surviving in school. Furthermore, if low grades can be neutralized, 
ihe model suggests that students reveal their preference intensities over their course 
work. Therefore, it is perfectly rational for students to decide on different strategies in 
the allocation of their time. For the mou?rately endowed student a trade-off in favor of 
"major" course requirements would appear to be the most optimal allocation of his 
time. In addition, student behavior that in effect anticipates low grades in some courses 
does not necessarily mean that the student does not have aspirations for these courses, it 
simply means that given his preferences, aptitudes and time constraint he has chosen to 
behave this way to maximize his satisfaction. An increase in either aptitudes or time may 
change his behavior. According to the model developed, there are several reasons why 
students may be more motivated (expend more time) in major courses, whether they be 
required or electives. Of course, electives are by definition the prerogative of the student 
and therefore would presumably be chosen on the basis of his preferences. Therefore, 
student behavior in terms of time allocation may explain in part the evidence that tends 
to suggest that large auditorium classes (usually required courses) are no less effective 
than classes with smaller student/teacher ratios. If the student has decided on a particu- 
lar allocation of time at the outset (beginning of the semester), it may be questionable 
whether the professor is able to affect even marginally his allocation choice given student 
preferences and constraints. If the introduction of new teaching techniques or texts is 
designed to simply change student "preferences" and, therefore, a student's time alloca- 
tion, the model suggests that such techniques may not be very successful in required 
courses since trade-offs must be made in more "preferred' area, such as a student's 
major. 

Technological Charcge? 

0.. the other band, assume that these techniques are effective in extending the 
boundary of the aUainable set. That is, changes in teaching techniques, textbooks and 
class size really do make a difference given a student's aptitudes. Figure 2 is identical to 
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Figure 2 

Figure I except that a change in teaching techniques or materials is introduced which is 
illustrate^ by the dotted line. 

Assume (b) represents required "distributive" courses and (a) "major" courses. 
Techniques that are technologically effective are defined as increasing a student's 
"apparent" aptitude in the technologically affected area or course. These techniques are 
assumed to be external to the student and do not require increased student inputs such as 
time and intellectual effort. Our hypothetical student is now ble to achieve b 6 if he 
devoted all his time to (b). If performance standards (correspondence between achieve- 
ment and grades) do not change, the effect of introducing learning technology in area (b) 
is to lower the relative price of (b) represented by the dotted line. Indeed this may be the 
ir-i-r of introducing these techniques in required courses so as to induce a substitution 
e* towards (b), thereby tempting °tudents to specialize (major) in (b). 

However, an income effect is also associated with the relative price change in (b). 
The introduction of new techniques in (b) allows the student to allocate more time to (a), 
thereby increasing his grade in (a) without affecting his prior achievement level or grade 
in (b). 9 Given the indifference map illustrated in Figure 2, the substitution effect is almost 
completely offset by the income effect. 10 That is, courses in (b) are inferior goods. The 
net increase in achievement resulting from the change in technology of field (b) is 
(b't - bi) which may not be statistically significant. 11 However, the technological change 
allows the student to allocate more time to his major field (a), thereby allowing him to 
increase his achievement (a'i - a3) and grades without significantly affecting what would 
have been his achievement level (bi) and grades in the distributive courses (b). The 
assumption that required courses may be inferior goods for some students does not seem 
to be totally unrealistic. Therefore, studies that concentrate on changes in achievement 
levels in the technologically affected courses while ignoring effects in other courses may 
not find statistical differences if the courses are inferior goods. 



The solid line may be taken to represent a control group of students with similar aptitudes to those 
of the experimental group (dotted tine) that is introduced to the new learning technology. 
l0 Note that preferences are defined over a field of choice that represents achievement and gtades. 
"in addition, it is conceivable that preference maps may be such that Giffen's Paradox occurs. 



Some Suggestive Evidence 

Studies on pass/fail tend to support trade-offs in a student's time allocation pattern 
over his course load. Note that pass/fail is not assumed to increase the student's attain- 
able set in terms of achievement.' 2 This evidence only suggests that students do make 
trade-offs when given the opportunity to do so at lower costs. A study at Dartmouth 
College b> Feldmesser [4, p. 63) revealed that the most distinctive characteristic of the 
option was that it was a way of reducing the burden of distributive requirements. 13 Users 
of the option tended to receive a full letter grade lower than nonusers regardless of a stu- 
dent's cumulative G.P.A. Further evidence suggests that of two students (of similar 
abilities) taking a course in their major field, the one using the option in another course 
would average about half a grade higher than the one not using the option while the grade 
deficit of the user in his option course does not affect his overall G.P.A. [4, p. 1 16]. While 
high G.P.A. students more or less made up in a course taken for the mcjor what was lost 
in a course taken in the option, no such compensation occurred among low G.P.A. stu- 
dents. The lower achievement effects of the option in the course in which it was being 
used seemed to spill over into other courses such as their major [4, p. 133]. The time re- 
leased seems to have been expended in other activities besides course work. 

The above-mentioned evidence is highly suggestive of students revealing their prefer- 
ences over their course work when a change in the relative prices is brought about by the 
introduction cf a pass/fail s>stem. The evidence supports the notion that distributive 
course requirements are inferior goods. 14 

Conclusions 

The analysis suggests that the role of distributive course requirements requires a 
study in itself. Statistical studies that focus only on one course from d stu^nt's bundle of 
courses are likely to overlook the spillover impacts of a technological change in teaching 
techniques. It is not at all clear that these teaching techniques are efficient in ihe sense 
of increasing a student's attainable set or are simply attempts to change student prefer- 
ences. 1 s In any event, the model presented suggests a new methodological approach to 
determining if these techniques are efficient b> focusing on a student's tVie allocation 
pattern and substitution and income effects. 
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grades. 

"Othei studies on pass/fail reveal similar resi ts. However, Feldmesser* s methodology appears to 
be more satisfactory since lie concealed the nte it of the study and considered a student's entire 
bundle of courses as well as attaining data on student time expenditures m various courses. 

''Advocates of pass/ fail implicitly argue that the option course is treated like an interior good be- 
cause the option is restricted and therefore changes relative prices. The inference is that a universal 
system would have income effects only. 

''Note that it is conceivable that these new techniques may decrease the at*ainable set. In this case, 
a student may be forced to spend more time i.i the required course to maintain a passing grade at 
the expense of sacrificed achieve!,. ent in other cou ses (e.g., major*. 
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Research in Economic Education: 
Are Our Horizons Too Narrow? 
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1* Introduction 

There has been an increased awareness in recent research literature that the earlier 
experiments in economic education have been limited in both scope and experimental design. 
Two directions for further research have been suggested. The first suggestion has been to 
improve the technical aspects of evaluation. An example of this is Soper [15, p. 40], who 
claims that "research in economic education at the collegiate level has now progressed to the 
point where professionally acceptable results require the use of normed, reliable and validated 
instruments (such as the TUCE) and O.L.S . regression models' ' and that 44 this field of research 
has now reached a level where increasing sophistication in both theoretical specification of 
models and empirical techniques is necessary if further advances are to be made . " 1 The second 
suggestion has been to extend the scope of this evaluation. An example of this is van Metre [16, 
p. 101 J, who argues that the first approach is too limited and that "economic educators should 
stop looking for the best economics teaching method and, instead, look for flexible learning 
systems which can embody appropriate teaching methods as indicated by the type of learning 
outcomes desired and the learner characteristics of the particular students involved." 2 

The aim of this paper is 'o point out that if the only direction taken by Soper* s so-called 
second generation research is the use of second generation econometrics, then the problem of 
"poorly d/awn conclusions and inappropriate experimental design" [16, p. 95] may be 
exacerbated. The desire for "professionally acceptable results" has already led to a 
concentration on measures of output for which "normed, reliable and validated instruments" 
are available. As a result of this, conclusions regarding the efficacy and desirability of the 
teaching methods in question have been made on the basis of these readily available and easily 
measured outputs. Less easily measured outputs have been ignored. This has occurred even 
though one method of teaching can only be described as better than another if it is possible to 
assess all the objectives of a course unambiguously and to decide what the weight of each 
objective should be. Implicit in the conclusions of much of our research data is that the major 
criterion or objective of economic education is a very narrowly defined concept of learning, 
related in some way to student performance as measured by achievement or course grade. 

In die following sections it is argued that student performance is not the only criterion 
which could be used in evaluation and that once the possibility of multiple criteria is 
considered, there will be difficulties in assigning diem weights. The conclusion is that there is a 
danger of our reseai ch horizons becoming too narrow because of these inherent difficulties , and 
it is suggested that this possibiK y can be avoided by a conscious attempt to encourage research 
in both and not just one of the two directions suggested. 

Dr. Judy Yates is a lecturer in economics at the University of Sydney. 
O URCE: Journal of Economic Education, vol 10, no. 1, Fall 1978, pp. 12-17. Reprinted with permission of the 
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2. A Single Criterion for Evaluation? 

Before considering the question of what the major criteria should be for evaluating a 
particular teaching method used in economic education, it is important to consider the much 
broader question of the objectives of education in general. Johnson [S] has argued that the 
objectives of higher education can be to produce human capital, to serve as a source cf 
knowledge, or to a ♦ as a filter for the labor market. Although an individual course of itself 
cannot bear the buruen of all of the objectives of the whole education system, it must surely be 
recognized that the course is just a part of this whole system and cannot, therefore, be treated in 
isolation. The contribution that a course makes toward the objectives of the education system 
must always be borne in mind. These objectives are often ill-defined and may vary significantly 
among institutions (for example, between universities and colleges). At the risk of overlooking 
important ones, a number of objectives can be listed: 3 students' growth and development with 
regard to knowledge and skills (which can include application, critical thinking, creativity, 
motor skills, etc., as well as comprehension and understanding); their social development 
(including leadership, communication, interpersonal relations, etc.); their acquisition of 
vocational skills, and soon. Many of these skills will not be measured in any way by the use of a 
TUCEfFett of Understanding in College Economics) type of test; nor will they be developed by 
many of our approaches to education . Concentration on formal learning and the related skills of 
analysis quite possibly arises from a questionable belief that those qualities can be objectively 
assessed. 4 However, such concentration may also result from treating a specific course in 
isolation rather than as a part of the whole education process. 

Objectives such as an increase in knowledge and/or skills at varying degrees of 
sophistication which are evaluated by achievement or course grade are obviously part of what is 
being sought. However, the fact that they continue <xi> major objectives despite the evidence of 
lack of anything like perfect retention one or two years later 5 could be taken to indicate that this 
specific knowledge is, of itself, not the sole objective of the course in question but that it is 
possibly being used as a proxy for some other objective. It may be, for example, that what needs 
to be considered and what is implicit in our choice of output measure is the process by which 
this output is achieved rather than the output itself. 6 The contribution of a particular technique 
or teaching method to the process of learning how to learn rather than to the srecific output 
of precisely what was learned may be its most important attribute. Allison [2] has suggests that 
learning how to learn does take place from one semester to the next and that the extent of this 
process might vary according to the teaching method employed. If the process is regarded as 
being as important as the output, there may be a need for more subtle evaluation measures 
which can distinguish between the contribution of a particular method to the process cf learning 
and its contribution to the outcome of learning. In some teaching methods students are required 
to organize the material for themselves; in others it is done for them. In some teaching methods 
students have the discipline of steady progress through the subject imposed upon them; in 
others, they are left to their own resources. These teaching methods may reveal no apparent 
differences in their effect on achievement, but if either the ability to organize one's own 
material or the self-discipline required toorganize one's self is desired as an objective in its own 
right then the effect that the choice of teaching method has on the possibility of attaining these 
must also be taken into account. 

To illustrate the potential weakness of basing conclusions on consideration of just one 
single objective, Soper's own work can be cited. 7 His and Thornton's fairly thorough study of a 
Keller-plan approach to economic instruction [14, p. 89] claims to have shown "that a 
completely self-paced teaching format for macroeconomics is inferior to a well-directed 
concept-oriented, graduate-student instructed, lecture-discussion taught course." This conclu- 
sion is based on no discernible improvement in grades and is made despite a long list of 
potential criteria which were not considered but by which suc.i an approach could have been 



assessed. As an indication of some of these criteria, in addition to the ones already raised, those 
designated by Allison [1] could be mention^ * She lists as benefits the fact that a self-paced 
system has a nonthreatenmg system of assessment and hence a resultant reduction in student 
anxiety, that it results in increased student control over grades; that there is a strong preference 
for this method over the onventional alternative; that it has a virtual guarantee that all students 
will achieve mastery of the subject, and, finally, that it is exciting for the instructor to introduce 
and conduct a new teaching method which is so favorably received by students. In addition , in a 
later paper [2] she also suggests that the organization of material which occurs in a self-paced 
approach may help students in learning the how-to-learn process discussed earlier, particularly 
m the early stages of higher education. To offset these potential benefits, such a system might 
have an adverse effect on organizational ability and on self-discipline. All these potential 
benefits and costs are possible objectives of a course and even if they are not necessarily 
primary objectives, consideration of th n may well weaken the unambiguous conclusion 
reached by not taking them into account, 

3, Which Criteria? 

It follows from section 2 that the problem of choosing between two alternative approaches 
to teaching should not be the simple procedure implied by much of the current research. The 
problems of evaluation of a particular course are magnified once this course is seen as a part of 
the whole education system, since its evaluation cannot really independent of an evaluation 
of that whole system There will always be the difficulty cf deciding on exactly how much 
consideration should be given to those objectives that are system- oriented compared with those 
that are subject-oriented. The dtfnculty in assigning these weights resuits in part from the 
ill-defined nature of many of the objectives, in part fkom a lack of consensus aboui them, 8 and in 
part from the conflict between students and staff in their attitudes toward those oojectives. 

It is not l.nprob^ble that students' attuuaes are such that they develop the skills of 
surviving the system with a minimum of effort at the expense of those skills more '-elated to a 
disciplined search for knowledge. 9 Kelly [10] has used this possibility of coi tiict between 
multiple objectives to explain his seemingly contradictory findings that, although the 
introduction of a specific technology (namely, the distribution of printed lecture notes) had a 
negative impact on achievement, students were prepared to pay a positive price for this 
technology. He argued that this finding can be expir.-ned by recognition of the impact of the 
particular technology on the allocation or the students* time and by recognition of the 
distinction and trade-off between "educational outputs" and "leisure outpuis." He concluded 
that "policy implications relating techniques to increased educational efficiency are clearly 
sensitive to a specification of whose values are used to identify and price the outputs associated 
with the instructional process." 

Dual or multiple objectives at the system level, such as the teaching and research roles of a 
higher education institution, also have implications for the evaluation of experiments in 
economic education. For example, the questions of whether research or teaching is more 
desirable and whether there is a trade-off between them or whether they are complementary 10 
are of fundamental importance. They must be answered before choices can be made between 
teaching techniques that have different implications for staff involvement regardless of whether 
the techniques have different effects on the other objectives being considered. 

A final and important point concerning the weighting of objectives can be made by 
reference to the distribution of outcomes. Many studies hav* shown that sex, age, ability, 
effort, etc., are all determinants of outcome (again, usually achievement, but occasionally 
attitudes or enjoyment) and fewer, but still a significant number, have actually considered the 
differential impact of various techniques according to these variables. Those studies which 
have considered distributional impact have shown that the same factors that determine outcome 
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are also the major factors that need to be considered when looking at the distributional effect. 11 
Given that the outcomes of a particular approach can vary according to sex, or ability, or 
whatever, the oecision to choose one method over another becomes an ideological matter. 
Unless explicit value judgments are made as to what the desirable distribution should be, it 
becomes impossible to decide which of two technologies that may have identical average 
outcomes but different distributional effects is 1 'better. " Conversely, of course, any such 
choice does reflect implied judgments, whether intended or not. 

This distributional aspect is also relevant in considering the impact of various technologies 
on a student' s rate and method of learning. The point will not be developed here because the fact 
that students do have different rates of learning has been well documented and taken into 
account explicitly by the various attempts at introducing self-paced courses. The second aspect, 
that students have varying methods of learning and that this may be an explanation of the 
distributional effects , has been argued more fully by the author [ 1 7] and by van Metre [16], but 
has possibly not received as much attention in economic literature as it ought. 12 

4. Conclusion 

In conclusion, it is claimed that economic education will almost certainly have many 
objectives, some of which may be conflicting and some complementary. Different approaches 
to education will have different effects on each of the objectives both in terms of outcomes and 
in the distribution of those outcomes. Continued research in the direction suggested by Soper 
will lead to increased information about the relative efficiency and distributional impact of 
various approaches with respect to just one of die many possible objectives. It will provide 
valuable information, but it will not, and should not, provide an adequate base on which to 
make "best" choices. Narrow horizons, chosen simply because they lead to 4 'professionally 
acceptable results/ * must be avoided. Information is needed about the relative effectiveness of 
each of the various approaches on all of the possible objectives, even if readily acceptable 
instruments are not available. This information and a knowledge of the likely distributional 
effects of the available alternatives will enable educators to make their own considered value 
judgments as to what they think will be the most effective means of achieving stated or even 
unstated objectives. 

In the absence of a consensus concerning the desirability and importance of certain 
objectives, the suggestion that we should be looking toward the adoption of more flexible 
learning systems (including assessment systems 13 ) appears to be soundly based, since if this is 
done there is less danger of too much weight being placed on a single outcome or a single 
distribution of outcomes. 



FOOTNOTES 

l He argues that the use of nonrigorous theoretical and empirical procedures may have resulted in incorrect 
evaluations of particular educational approaches. 

*His argument is essentially based on the claim th . any particular course will have multiple objectives and 
that these are best achieved by different techniques for different students because different students have 
different methods of learning. As was pointed out by the author in an earlier paper [17], these same 
conclusions, based on very similar reasoning, were suggested by Bligh [3] and were supported by 
reference to a very extensive range of education research literature covering almost every field of study. 
'These have been taken from [13]. 

4 Increasing doubts have arisen over the objectivity of the measures used. Kipps et al. [ 10], for example, 
concluded that the specification of the dependent variable (that is, whether as absolute achievement, 
absolute improvement, percentage improvement, or gap closing score) could bias the results of any 
regression equati m awl so warned that 1 'any recommendation regarding teaching methods based on such 
statistical findings, therefore, might be unwarranted.' * Unless the instructor is able to unequivocably 
claim that he is aiming to improve the gap-closing score, or the absolute improvement, or whatever over 
the whole class, then Kipp's results make interpretation of existing findings, less objective than originally 




thought. Likewise, Allison [2] points out that single-equation models can only provide "information on 
achievement i*d not on other outputs such as enjoyment or concentration" and concludes that it is quite 
possible to ii«^oduce bias into the coefficient estimates by ignoring any simultaneity of these factors, 
hence making them an unreliable base for conclusions or recommendations. On similar grounds, which 
appeared in tfce Fall 1976 issue of this journal, the Soper-Becker-Highsmith exchange on research 
methodology by raising problems of multicoUinearity, stability of coefficients between groups, 
measurement errors, and errors in specification, clearly points out the difficulties inherent in the search 
for an objective measure. 

5 Craig and O'Neill [Si suggest that the decay factor is approximately 25 percent for economics over this 
period, a higher figure Chan that suggested in other studies in economics they mention, bui considerably 
better than the 94 perceut loss they report for science education. 

•The possibility that the process may be important is raised in [6]. 

T Soper , s study is, of course, not the only example available. A multitude of studies have readied 
essentially the same conclusions on the basis of essentially the same types of results. 

•Horton and Weidnaar [8] conducted a survey among some two hundred economists i. attempt to 
isolate a single goal for economic education . However, despite their use of the iterative Delphi procedure 
they were unable to determine agreement on a single goal. They argued that this was necessary because 
"without such a single goal we disperse our efforts, confuse them, and develop no solid base for the 
specification of instructional objectives and no systematic one for the evaluation of our efforts." 
Although I recognize the value of their attempts, I do not agree with their rationale 

•This may be a result of a conflict between the objectives of higher education of swerving both as a filter for 
the labor market and as a source of knowledge. 

"See, for example, McDonough and Kannenberg [!2] for one particular aspect of this discussion. 

1 Sanson, Kelley , and Weisbrod [7] illustrate the point being made here with the aid of a simple , borrowed 
diagram based on only one characteristic (ability) and one output criterion (course grade) . They show that 
it is possible frv two methods to be equally effective on average but for one to favor high achievers and the 
other low achievers. 

"It does receive considerably more attention in education literature. 

,3 There has been wide discussion concerning the importance of the right choice of assessment method 
because the choice made affects the perceived achievement of the stated (and even unstated) aims of the 
course and determines whether the intended aims actually will be the ones assessed. See , for example , [4] 
for a survey of this discussion. Van Metre [16] adds several examples relevant specifically to economic 
education. This same point (that is, a plea for flexibility) must surely apply to the evaluation instruments 
being used as well; tests such as those mentioned by Soper have been shewn to test knowledge and 
understanding and simple and complex analytic*, skills but whether such tests, nationally validated or 
not, can be used to test creativity, maturation and so on is a very debatable question. 
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ECONOMIC EFFICIENCY AND THE DISTRIBUTION OF 
BENEFITS FROM COLLEGE INSTRUCTION* 

By W. Lee Hansen, Allen C. Kelley, and Burton A. Weisbrod 
University of Wisconsin 



Economic efficiency implies an equating, at the 
margin, of benefits and costc. In this paper we 
explore a concept of "efficiency" which is broader 
than the usual framework and which applies to 
commodities and services produced and distrib- 
uted largely outside the private, profit maximiz- 
ing sector. An assessment of the economic effi- 
ciency of producing such a commodity requires 
the determination of its outputs and the valuation 
or weighting of these outputs. 1 Our principal point 
is that these weights, in turn, depend on who 
receives the outputs; thus, distributional issues 
are at the heart of economic efficiency studies 
involving a wide range of activities undertaken in 
the governmental and private, nonprofit sectors. 5 
One of these activities — the production and dis- 
tribution of college instruction in economics — 
illustrates well the significance of this particular 
approach to the analysis of economic efficient" 

We argue that in analyzing the economic eia- 
ciency of instruction, distributional i^ues — that 
is, who receives the benefits — should be consid- 
ered explicitly; if not, they will necessarily be 
considered implicitly. The pervasive failure to 
include distributional issues in efficiency studies 
suggests an excessively narrow concept of effi- 
ciency. ThL> is particularly inappropriate in eval- 
uating instruction, since in education, as in most 
services, decisions regarding what to t^ach and 
how to teach have a strong influence on who 
receives the benefits.* 

I 

Total benefits from instruction are a function 
of the amounts gained by each student and of the 
values of each amoant; these values vary among 
different students or types of students. Therefore, 

* The authors are grateful to their colleagues, R. 
Andreano, F. Golladay, R. Lampman, <\nd E Smolen- 
sky, for comments on an earlier draft. 

* We assume throughout that output is measured in 
incremental, units-added terms. 

* This general point has been developed in detail by 
B. A. Wdsbrod {17). In his study the concept of 
efficiency which encompasses the traditional view of 
efficiency as well as the distribution of out. *it is termed 
"grand efficiency." 

* This is true of most services because of the require- 
ment th?i the consumer be present at the time and 
place o! production (consumption) of the service. 



aggregate benefits are a function of how the out- 
puts are distributed among students. Symboli- 
cally, the marginal benefits, Bt t from resources 
employed in an instructional approach (technique 
and/or course content) k is: 

dq, dbj 



(1) B k . 



T 

J?i ddk dqs 



where 

qs— the quantity of output produced by the 
input mix k and received by student j K \ and 

£,= the value of benefits (output) accruing to 
student j. 

Of the two partial derivatives the first is the 
marginal physical product of input (or input mix) 
k for student,;, and the second indicate* the valu- 
ation of the marginal product 

As expression (1) reveals, the importance (both 
in quantity and in value) of any particular form 
of output may vary with the type of student re- 
cipient. For a student planning graduate work 
certain course outputs may have great value, 
whereas these same outputs may be of slight value 
to the student who plans to continue no further 
in the field. Similarly, instruction about behavior 
of the stock market may contribute greatly to 
the knowledge of persons from disadvantaged 
backgrounds while adding little to the knowledge 
of other students. 

Most studies of instructional efficiency or 
studies which appraise the merits of particular 
teaching techniques have not estimated — nor 
have they even considered — the impact of alter- 
native teaching approaches (input mixes) on the 
distribution of outputs among students. Neither 
have they examined the possibility that the value 
of outputs varies according to the distribution of 
the output. In terms of the model, the subscript 
j has been entirely disregarded. The assumption 
implicit in such a simplification is either that 
students are a homogeneous group — each student 
receives the same amounts of outputs from a 
given course and the outputs have the same value 

* In practice, the su mmation will be over groups of 
students who possess roughly similar and "relevant" 
attributes. Parenthetically, it might be noted that if 
j m 1, expression (1) represents the value of an input used 
to produce a private good. 
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for e?.ch student — or that students should be 
treated as if they were a homogeneous group. 
Neither assumption seems reasonable, for rather 
obvious reasons. 

This discussion has assumed implicitly that it 
is students' valuations that should count. There 
are many who feel, however, that valuations 
should be made by other groups, such as the fac- 
ulty, the college administration, the taxpayers, 
or particular groups of students. Our general 
point — that the magnitude and distribution of 
benefits depends on the values assigned — is gen- 
eral and is invariant to whose values are applied. 

In the following pages we discuss how instruc- 
tional techniques and approaches influence the 
distribution of total output and the value of that 
output and how the distribution of output relates 
to a broad concept of economic efficiency.* 

II 

Consensus on course content or instructional 
technique has not been reached in economics or 
in general. The literature abounds w ith arguments 
for or against the " citizenship" focus [4] [5] [16] 
or the preparation of potential graduate students 
[13]. At another level we find some economists 
calling for a heavy mathematical orientation [2] 
(though whether as a "means" or as an ' end" is 
not always clev'r); others advocate a decision- 
making framework [3], and still others argue the 
merits of shifting emphasis from macro- to micro- 
economics [15]. And recently there has been much 
discussion of the impact of new instructional 

* Although milch of college instruction is outside tne 
public sector, our framework is relevant to the theory 
of the public sector which permits analytically separ- 
able allocation and distribution branches. If the alloca- 
tion branch decisions affect the distribution of welfare, 
then it is not possible to determine what and how much 
of some commodity to produce unless costless lump- 
sum redistributions are possible, or unless the welfare 
function — involving the valus-weights referred to in 
expression (1) — is known [17]. The formulation in 
expression (1) permits the quantity of output to vary 
among consumers. This is by contrast with some of the 
literature on the pure theory of public goods, in which 
it is assumed that all consumers benefit equally in the 
sense of receiving equal quantities of output of the 
public good (141. It is clear, of course, that even if all 
did receive equal quantities of output, the individual's 
valuations could differ greatly. Instruction does have 
a considerable public-good element; thus we might well 
have couched our argument simply in terms of differ- 
ences in consumers' (students') valuations rather than 
in terms of differences both in their valuations and in 
the quantities of outputs. The difference in approach, 
however, is not substantive. Decisions will depend on 
the products of quantities and values — as expression 
(1) indicates — and our point is that tlicse products are 
likely to vary significantly with the choice of course 
approach. 



techniques, among them programmed instruction 
[1] [8], television [9] [12], and TIPS (Teaching 
Information Processing System) [6]. 

In all of these illustrations, eith . course con- 
tent or instructional technique is being considered 
explicitly. But the distribution of benefits among 
types of student clientele is also very much at 
stake, and differences in judgments as tc whom it 
is most "important" to teach may be at the root 
of much of the controversy. The issue of impor- 
tance or values U reflected in the right-hand deriv 
ative (dbj/dqj) in expression (1), above. 

Even if the importance of benefiting all typc^ 
of students is equal or is assigned to be equal, the 
actual benefits (the left-hand derivative, (dq/dki 
in expression (1)) are not likely to be equal. It 
-ccms intuitively clear that different types of 
students— as defined by previous academic per- 
formance, desire for theoretical rigor, degree of 
social concern, family background, and the like — 
will benefit differentially, depending upon course 
content and instructional ttchniquc. A highly 
theoretical and mathematical formulation in the 
basic economics course nay provide the largest 
benefits for students alrc<* 4 . thinking seriously 
about graduate work in economics, whereas a 
course focusing on a less formal treatment of 
contemporary economic problems (and employing 
a similar instructional technique)* may benefit 
most those students seeking a "general" educa- 
tion. Still another choice of course content is 
likely to be most beneficial to prc-law students 
Given the heterogeneity of the enrolled students 
and the difficulty of offering simultaneously a 
variety of course contents, some students are 
certain to receive larger outputs 7 than will others 
Yet little or no evidence exists on the strength of 
the linkage between the distribution of outputs 
and the cho : :e of course content. 8 

What has just been said about course content 
also applies to instructional technique. The tra- 
ditional theory of production assumes that the 

• This qualification is intended to hold instructional 
technique constant in the present discussion so as to 
concentrate attention on course content. 

' Important problems exist fcith lespect to the 
definition and measurement of "outputs" — they may 
well be multidimensional— but these are outside the 
scope of this paper. 

1 In practice it may be difficult to separate the 
qua.itity of output from its value, ami, indeed, the 
two may be interrelated. For example, if the con- 
sumer places a high value on increments of a particular 
type of course content, then his attitude toward learning 
may be "better," with the result that he will receive a 
larger quantity of output This would make expression 
(1) more complex for <7,A =/(fy/?j). An exploration of 
this complexity is beyond the scope of our present 
efforts. 
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choice of production technique is a decision sepa- 
rable from the decision regarding the distribution 
of output. This assumption seems to dominate 
much of the evaluative research in economics 
education, for little effort has been devoted 
finding out which kinds of students benefit , 
how much from the use of different instructional 
techniques; e.g., television and programmed 
instruction.* The available evidence often shows 
whether the new technique is an improvement in 
the sense that students attain higher average 
scores than they do in courses using conventional 
methods. But the degree to which the new tech- 
nique alters the distribution of performance is a 
subject scarcely ever raised. 

The differential effectiveness of a given instmc- 
tional technique for different groups of students 
and the differential value of a unit of "effective- 
ness" for different groups of students are illus- 
trated in Figure 1. In this example, students are 
arrayed according to prior academic achievement, 
as measured on the X axis by a test given prior 
to beginning a particular course of study. The Y 
axis measures performance on an appropriate test 
of accomplishment following completion of the 
particulai course. The curves labeled I and II 
show course performance — using alternative in- 
structional techniques on two comparable groups 

9 This is true for all subject-matter areas, not only 
economics, and at all levels of education, not merely 
collegiate. This gap in educational research has been 
pointed out by others. For example, Robert Locke, an 
executive of the McGraw-Hill Book Company, has 
forcefully argued, . . the ideal program should not 
only use the most appropriate media for each task or 
objective, but it should also offer alternative media to 
accommodate differences among learners in ability, 
experience, motivation, style, and rate" 17). 



of students— as a function of the type of student 
within each group. "Course performance" i effects 
use of a particular test instrument for measuring 
"outputs" of the course. 

The crossing of the two curves indicates that 
Technique I is more successful for the "stronger" 
students — those with prior academic achievement 
above P — whereas Technique II is more success- 
ful for the weaker students. Assuming an equal 
number of students at each level of prior achieve- 
ment, P, it is clear that the average student 
gained more from the use of Technique I (that is, 
area CDE exceeds area ABC). Thus, if one and 
only one of the two techniques is to be used, the 
preferred approach is certainly number I. Or is it? 

True, Technique I gives a larger mean and, 
hence, aggregate level of course performance. But 
what of the value of that performance? Despite 
the apparent superiority of Technique I, it would 
actually be inferior to II if the function for valu- 
ing benefits attached sufficiently greater weight 
to a unit benefit when realized by students with 
poorer prior achievement (e.g., the "disadvan- 
taged"?). How to establish the values assigned is 
not an easy matter. But it is at this stage, involv- 
ing valuation, that normative judgments are 
blended with positive findings as to the effective- 
ness of alternative instructional techniques or 
course contents. 

The efficiency of Techniques I and II depends, 
ultimately, on costs as well as benefits. The tech- 
nique producing the largest value of gross benefits 
(output) is not the most efficient choice if its costs 
are enough greater so that the value of benefits, 
net of costs, is smaller than for some other instruc- 
tional approach. Moreover, it may be the case 
that none of the alternatives is efficient; that is, 
perhaps none produces net benefits that are 
positive. 

Two important implications of the framework 
captured in Figure 1 might be noted at this point. 
First, the possibility of interesecting lines has 
profound implications regarding an analysis of the 
literature on the apprais J of teaching approaches. 
In general, although many studies in the educa- 
tion and economics literature show no significant 
impact of an experimental teaching approach, the 
studies are incomplete. They do not distinguish 
between the case of zero impact for all students 
and the case of positive impacts for some ,tudents 
and roughly equal negative impacts for others. 
Indeed, as McK^achie emphasizes, in his review 
of the voluminous literature on college teaching: 
"One reason for the host of experimental com- 
parisons resulting in non-significant differences 
may be simply that methods optimal for some 
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students are detrimental to the achievement of 
others" [11, p. 1157). However, neither we nor 
McKeachie knows which of the two cases above is 
closer to the truth. The implication is that much 
of the prior research needs to be reworked. 

Second, if there is a distributional impact of the 
teaching approach (i.e., if the benefit curves of the 
experimental and control approaches are non- 
parallel, whether or not they intersect), then 
empirical tests which omit these distributional 
effects will produce statistically biased results. 

Ill 

One of us has experimented with a new instruc- 
tional technique in the principles course in eco- 
nomics at the Universit> of Wisconsin, and some 
of the preliminary findings are relevant here. The 
details of this experiment are not important to the 
present discussion. What is interesting is that this 
experimental approach, while having a beneficial 
impact on all students, provided larger quantities 
of output (in terms of the left-hand derivative in 
expression (1) above) to some groups of students 
than to others. 

Figure 2 shows that the experimental tech- 
nique, by comparison with a standard lecture 
technique (used with an essentially randomized 
control group of students), produced positive 
amounts of output for students at every level of 
prior achieve men*. 10 But it also shows that the 
largest outputs went to students with the lowest 
ACT scores. 

The fact that the experimental technique 
dominates the standard technique may appear to 
suggest that our findings are uninteresting in the 
context of concern about economic efficiency. 
Since the experimental technique involves added 
cost, however, this is not so. Even with the ob- 
served dominance, two issues remain unresolved: 

(1) Are the benefits disclosed in Figure 2 — the 
area A BCD — lar^e enough to warrant the costs? 

(2) If total cost is a constraint, so that the experi- 
mental technique can be provided to some stu- 
dents but not to all of them, to whom should it be 
provided? 

The answer to the first question depends criti- 
cally on how the outputs (benefits) are valued, 
i.e., on the welfare function. There is, in general, 
no unique value for a unit of benefit (a point of 
added test score), for the value is a function of 
who the beneficiary is. u 

10 In this illustration students arc classified by prior 
level of academic achievement; clearly, however, other 
classifications may be appropriate; e.g., in terras of 
family background. 

"It may also be a function of the amount of benefit 
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With regard to the second question, if the ex- 
perimental technique is to be provided to, say, 
cne-half of the students, wL!~h group should 
receive it? In other words, if total resources avail- 
able are fixed, for which students will the value of 
output be a maximum? 

The first thing to note is that Figure 2 cannot 
provide the answer. What it does show is that the 
program should be provided to the students with 
the lowest ACT scores if the objective is to maxi- 
mize the measuied quantity of output. But 
whether such an allocation would maximize the 
value of output is another issue. It will if the value 
of a unit increase in economics test score is a con- 
stant or decreasing function of prior ACT score, 
ceteris paribus. 

If . however, there were a preference for helping 
"strong" students — a preference that may be 
reflected in the offering of "honors" courses — 
then it could be efficient to devote the added 
resources to students with high AC! scores." All 
that is formally required to produce this result is 
that the value of output CD (Figure 2) exceed the 
value of AB t and that there be no sizable inter- 
action effects among stidents. 1 * Since AB^SCD 



realized by any given beneficiary; that is, there may be 
decreasing, or perhaps increasing, marginal value with 
respect to added units of bentfit to a particular student 
or group of students. 

11 It would be instructive to consider the forms of 
weighting functions that would justify providr.g large 
amounts of resources to particular subsets of students, 
as is done in honors courses and in special prcgrans for 
the disadvantaged. 

11 The latter assumption is necessary to insure that 
the separation of students into two or more hoinogcne 
ous groups would not alter their respective cla3S per- 
form a nee. 
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in the figure, an allocation of the experimental 
technique toward strong st dents io efficient if 
the value of additional economic^ output (as 
defined by the test used) be at least five times as 
great for a student at the 95th ACT percentile a-> 
it is for a student at the 25th percentile. 14 

IV 

The arguments in this paper have a number of 
research implications. Granted that an efficient 
allocation of instructional resources requires com- 
parisons of benefits with costs of alternative pro- 
grams, more information is needed as to the mag- 
nitudes of the benefits. These, in turn, depend on 
the distribution of benefits, since the value of a 
given absolute increment in output of instruction 
(i.e., achievement) is specific to the recipient. 
Therefore, it is clear that we need to learn more 
about who benefits; that is, what types of stu- 
dents benefit and how much each benefits when 
various combinations of course content and in- 
structional technique arc used. With this informa- 
tion in hand, normative judgments can then be 
applied — as expression (1) above indicates — in 
order to estimate the value of benefits from any 
particular course content jr instructional tech- 
nique. 

The following are some specific research 
suggestions: 

1. Batteries of pre- and post-course evaluations 
should be developed which reflect well-articulated 
output goals. While this paper has not concen- 
trated on a precise specification of outputs, the 
importance of such a specification is clear. The 
distribution of benefits (outputs) cannot be ascer- 
tained without a prior decision — either explicit or 
implicit — as to the definition of output More 
tests measuring different types of outputs are 
needed. 15 

2. When regression techniques or other statis- 
tical procedures are used in evaluating teaching 
alternatives, interaction effects should be included 
so as to permit estimation of relationships be- 
tween the instructional approach and a variety of 
student attributes (e.g., class, major, mathematics 
knowledge, family background). Parameters of 
the interaction effects — such terms as the applica- 
tion of television or programmed learning to stu- 
dent* with and without calculus — will provide 

14 For this problem — with the coat level constant — 
only an index of relative values is needed, not absolute 
monetary values. 

11 TUCE (Test of Understanding in College Eco- 
nomics) and TEU (Tests of Economic Understanding) 
represent an excellent beginning on this long-term 
project (SJII8J. 



ERLC 



information about the distributional impact of the 
teaching approach that is being studied." If 
students benefit differentially from alternative 
teaching approaches, then failure to account for 
interaction effects between the teaching approach 
and student attributes will result in a mis-speci- 
fied statistical model, biased parameter estimates, 
and often unintcrpretable results. 

3. Finally, it is important to face up to the 
issue of lelineating explicitly our _ ormative cri- 
teria on what "should" be the di tribution of 
outputs. We must place "values" on the outputs. 
Until these value weights are made explicit — and 
the tasK is not simple — they will continue to be 
implicit and, hence, not open to critical examina- 
tion. 

V. Summary 

The central thesis of this paper ia that produc- 
tion decisions in education on what and how to 
teach have distributional effects; as a result, dis- 
tributional considerations should enter directly 
when making teaching decisions and evaluating 
these decisions. 

The importance of this proposition and thus the 
validity of the inferences derived from it involve 
both factual and normative matters. If alterna- 
tive teaching techniques do have differential im- 
pacts by type of student (the factual issue) and/or 
if social objectives dictate that it is more impor- 
tant to benefit some types of students than others 
(the normative issue), then the distributional 
consequences of selecting a teaching technique or 
course approach should receive explicit attention 
in benefit-cost analyses of production choices. 

11 With respect to the pathbreaking research of 
Attiych, Bach, and Lumsden on the impact of pro- 
grammed instruction in economics (lj (81 we are en- 
couraged to learn that analysis is currently under way 
to investigate some of these types of interactions, and 
thus the distributional effects of programmed learning. 
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CONCEPTUAL AND EMPIRICAL 
ISSUES IN THE ESTIMATION OF 
EDUCATIONAL PRODUCTION 
FUNCTIONS* 
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ABSTRACT 



Measuring educational performance and understanding its determinants are 
important for designing policies with respect to such varying issues as tcaclier 
accountability, educational finance systems, and school integration. 
Unfortunately, past analyses of student achievement and educational produc- 
tion relationships have been plagued by both a lack of conceptual clarity and a 
number of potentially severe analytical problems. As a result, there is 
considerable confusion not onl> ibout what has been learned, but also about 
how such studies should be conducted and what can be learned. This review 
considers each of these issues and also relates knowledge from these studies to 
research about areas other than just school operations and performance. 

Despite a substantial and growing volume of research into educational 
production relationships and the determination of student achievement, 
considerable confusion remains about how such studies should be 
conducted, how past analyses should be interpreted, and what has been and 
can be learned from such studies These studies are interesting and important 
from a number of perspectives. First, they exemplify the difficulty 
frequently encountered in the empirical application of some basic economic 
models, and the lessons there apply to a wider class of problems than just 
understanding schools. Second, the results of these studies have ramifica- 
tions for a variety of analyses th" f Ho not focus on schools, such as wage 
determination, status achievement, the financing of schools, and the 
impacts of quality of education on urban location and housing choice. 
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However, perhaps most important— and a distinguishing feature of this 
research— is that the concluding sections of these papers on "policy implica- 
tions" do not have the more common hollow ring because the results of this 
research have frequently entered into judicial proceedings, legislative 
debate, and executive branch policy deliberations. Unfortunately, this 
research has been unusually difficult to follow, in part because its Develop- 
ment has frequently crossed disciplinary boundaries. 

/. SOME BACKGROUND 

The first, and perhaps still the most influential, study is Equality of Educa- 
tional Opportunity, or the "Coleman Report" [37]. That study, mandated by 
the Civil Rights Act of 1964, was startling in a number of ways. First, the 
survey information for over half a million students, co». fining data not only 
about students and the characteristics of their more than 3,000 schools, but 
also about their achievement in school, provided the most complete descrip- 
tion of elementary and secondary schools ever produced in this country. 
Second, and most relevant for this discussion, it directed attention to the 
importance of the relationship between school inputs and student achieve- 
ment. 1 Finally, it introduced into the public policy arena a bewildering 
array of technical and esoteric issues such as statistical significance, analysis 
of covariance, production efficiency, multicollinearity, residual variation, 
estimation bias, and simultaneous equations. 

The attention paid to the input-output analysis in the Coleman Report 
clearly reflects the direct policy importance of the analysis. Such informa- 
tion is critical not only to "school management/' but also to such diverse 
policy issues as school integration, accountability in schools, and the finance 
of elementary and secondary schools. The policy relevance of input-output 
studies has led to both a rapid growth in number of analyses and a concerted 
effort to interpret the many different, and apparently contradictory, results 
(e.g. , Hanushek and Kain [66], Bowles [2 1], Bowles and Levin [23 , 24], Cain 
and Watts [28], Averch and others [1 1], Levin [91]). 

As economists entered this area, the relationships estimated became 
known as "educational production functions" instead of simply input-output 
analyses. This was more than a semantic change, however; the term produc- 
tion function has special connotations that alter the interpretation of the 
results. In fact, one part of the discussion of these analyses is whether or not 
the models estimated are production functions in the usual technical 
meaning of the term. 

1 This perspective on the analysis of schooling, as pointed out in the response of Coleman 
[38] to his critics, was perhaps the most important aspect of the "Coleman Report." 
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11. TEXTBOOK ANALYSES 

All sophomore economics classes develop the concept of a production func- 
tion and its uses in analyzing the decisions of firms. In its usual abstract 
presentation, a firm's production possibilities are assumed to be governed 
by certain technical relationships, and the production function simply 
describes the maximum output feasible with different sets of inputs. The key 
distinction between a "production function" and any number of alternative 
descriptions of input and output relationships is the notion that it represents 
the maximum achievable output for given inputs. Firms are conceptualized 
as attempting to maximize profits through decisions about level of produc- 
tion and mix of inputs (given the production function, product demand, and 
input prices). With some embellishments, this represents the motivation for 
and theoretical backdrop to production function studies. 2 

The production function, along with the related theoretical apparatus 
of optimal firm decisions, is a powerful pedagogical tool, since it provides a 
basis for describing efficient production, the appropriate response of firms 
to changes in technology or input costs, and so forth. Further, the basic 
analytical constructs seem applicable to a wide variety of applications— there 
is a priori no indication that this structure applies to, say, the steel industry 
and not the education industry. 

These theoretical concepts arv, however, deceptively simp 1 When 
taken out of the classroom, they often require substantial modifier n. At 
one level, the concept of a technologically based production function may, 
in reality, not be particularly applicable in a variety of actual situations/ 1 At 
another level, even if appropriate, the actual production function is generally 



2 This analysis can be extended to consider multiple outputs or wider ranges of inputs. 
Additionally, certain assumptions about the form of the production relationships can be 
included. For example, declining marginal products for inputs are often assumed; that is, 
the additional output per unit from increasing a given input might be expected todechne as 
larger quantities of the input are used (holding constant other inputs). One might also 
believe that certain inputs are complements in production; that is. the value of a given 
input might increase as more of another (complementary) input is used. Finally, the 
possible effects of different scales of operation can be incorporated. 

3 The standard "textbook" view is that production functions arc derived from known 
engineering relationships that reflect exogenously given technological processes. The firm 
decides upon a mix of inputs, and the best process for combining these inputs is indicated 
by the production function and. therefore, does not have to be explicitly considered. A 
t ecurring theme, elaborated below, is that this may not be a good characterization of many 
production processes and firm decisions where: (1) individuals involved in production 
actually have considerable discretion in choice of process; (2) the "best" process might not 
be generally known and uncertain' is important; or (3) dynamics are important and the 
production technology is changing, i ora generalstatement of these issues, see Nelson and 
Winter[I09. 110. 111]. 
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not known a priori and must be estimated based upon the observed opera- 
tions of firms. However, such estimation raises a series of conceptual and 
statistical problems. 

Although some significant differences exist in the application of 
production functions to education and to other industries, the biggest differ- 
ences probably arise from the potential uses of the analysis. Few people 
would expect manufacturing firms to change their behavior given estimated 
production functions for manufacturing industries (see, for example, 
Hildebrand and Liu [71] or Griliches [54]), and there is very little temptation 
to prescribe any public policies based upon the results. The same cannot be 
said for education. Congress holds hearings on the size of estimated v effi- 
cients (see, for example, [141]), commissions include results in supporting 
their policies (for example, [47] or [116]); and courts receive testimony 
about regression equations/ Because the findings and interpretations of 
educational production functions go far beyond answering a set of esoteric 
questions of economists, educational production functions have been 
discussed more widely, and the confusion surrounding them seems 
somewhat greater, than production function estimation in general. 

///, CONCEPTUAL PRODUCTION FUNCTIONS AND 
EDUCATIONAL REALITIES 

Studies included under the rubric educational production functions are 
generally statistical analyses relating observed student outcomes to 
characteristics of the students, their families, and other students in the 
school, as well as characteristics of schools. Most frequently, student 
outcomes are measured by various standardized test scores, although 
attitudes, college continuation, and attendance patterns have also been 
analyzed. These studies also diverge considerably in terms of the actual 
measured inputs; L terms of the level of aggregation of both dependent and 
independent variables (e.g., individual student, school average, or district 
average observations); and in terms of the precise statistical methods. Not 
surprisingly, given such differences, the conclusions of the various studies 
appear to be very different — ^nd often apparently contradictory. 

This paper considers the major conceptual and empirical issues in such 
analyses with an emphasis upon the implicit assumptions and alternative 
interpretations of past models and results. While there is no attempt to 
review systematically individual studies, a later section summarizes the 



4 For example, Keyes v. School District No. /, 413 U.S. 189 (1973); Serrano v. Priest, 5 
Cal.3d 584 (1971); Hobsc Hansen, 269 F.Supp. 401 (D.D.C.1967). 
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major findings and ambiguities along with indicating some areas of 
profitable future research. 5 

A Measurement of Output 

The vast majority of production function studies measure output by 
standardized achievement test scores. A few have also considered other 
measures such as student attitudes (Levin [89J, Michelson[101], Boardman, 
Davis, and Sanday [18]), attendance rates (Katzman [80]), and college 
continuation or dropout rates (Katzman [80], Burkhead, Fox, and Holland 
[26]). Are any, all, or none of these sensible measures of educational output? 

While standard production theory concentrates upon varying quantities 
of a homogeneous output, this is not easily translated into an educational 
equivalent. Education is a service which transforms fixed quantities of 
inputs (i.e., individuals) into individuals with different quality attributes. 
Educational studies rightfully concentrate upon "quality" differences. 
However, simply because individuals can be ordinally ranked in terms of 
cognitive test scores does not imply that such a measure is necessarily 
appropriate. 

Perhaps the most important concern with standardized tests is the lack 
of external validation. These tests do discriminate among individuals; that 
is, they can divide the population into different groups. However, questions 
are generally selected by criteria internal to tests: (a) their ability to divide 
students (so that questions that can be answered by all or none of the 
relevant population aren't useful); and (b) their consistency with other 
questions (i.e. , whether individuals getting a given question right tend to get 
other questions on the test right). Further, a given text should produce the 
same score if taken at different times by the same individual, and slightly 
different wordings of questions covering the same concept should yield the 
same results. None of these relates directly to whether or not tests cover 
material, knowledge, or skills valued by society. 

Clearly, much of the observed interest in school system performance 
relates to the perceived importance of schooling to future capabilities of 
students— the ability of students to cope with and perform in society after 
they have left school. To be sure, there is some value to knowledge for its 
own sa ke, other things being equal, and more knowledgeable individuals 

5 Perhaps the most thorough review of the findings per se is still Avcrch and others [11 J. 
However, this is now somewhat out of date. A number of noteworthy studies do not appear 
in that volume: Central Advisory Council [32], Levy [92], Boardman, Davis, and Sanday 
[ 1HJ, Gamer [50], Perl [ 1 15], Heim and Pert [69], Summers and Wolfe [133, 134], Murnanc 
[106], Winkler [149], Jencks and Brown [77], Henderson, Mieszkowski, and Sauvegcau 
[70], Armor and others [61 Ritzen and Winkler [120], Winkler [150], Maynard and 
Crawford [100], Link and Ratledge [96]. 
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may be more interesting, happier, or whatever. However, if schools were 
perceived to perform a simple monastic role, it is inconceivable that they 
would receive the attention and interest that they do. Here we consider two 
dimensions of school effects: the effect on labor market performance and 
the effect on socialization— that is, political awareness, citizenship, moial 
values, etc. 

Economists have analyzed the influence of education on earnings and 
labor market performance (see reviews by Mincer [103] and Rosen [123]). 
Sociologists have explored the effects of schooling on occupational choice, 
mobility, earnings, and the relationship between schooling and personal and 
family characteristics (Sewell and Hauser [126], Boudon [20], Jencks and 
others [76], Alwin [4], Duncan, Featherman, and Duncan [44], Jencks and 
Brown [77], Blau and Duncan [17]). These studies direct attention to the 
critical question of what role formal education plays in influencing later lives 
of citizens — a focus frequently lost in research into school operations. 

However, a recurring problem with such studies is the inadequate 
measure of the education individuals receive. Most commonly, years of 
schooling is used to measure education. (This is even the case in models of 
"human capital production functions"; see Ben-Porath[13].) Few measures 
of the quality of education have been incorporated in such studies. Since the 
most pressing school policy questions concern how to improve the quality of 
education, this is a particularly unfortunate limitation. 

Some attempts have been made to incorporate qualitative measures, 
such as information about cognitive abilities of individuals or about school 
expenditure levels into labor market studies. b Such studies have been 
severely limited by data availability, the necessity to use fairly peculiar 
samples, and stringent assumptions about school operations. (For example, 
expenditure studies assume expenditure differences index quality differ- 
ences.) Futher, the results with respect to the effects of quality differences 
have been quite inconclusive. Thus, while these studies offer an important 
perspective on how to observe educational outcomes, they do not currently 
provide much guidance to studies focusing on the operations of schools. 7 

6 Sec Welch [145. 147], Wcisbrod and Karpoff[ 143], Ashenfclter and Mooney [10], Rogers 
[122], Weiss [144], Hansen, Weisbrod and Scanlon [60], Hanushek[62, 64), Johnson and 
Stafford [78], Morgenstern [104], Taubman and Wales [136], Sotmon [129], Link and 
Ratledge [95], Jencks and Brown [77], Ribich and Murphy [ 1 17], Lee [87], Wachtcl [ 142], 
and Akin and Garfinkel [2]. Only the Welch studies and the Jencks and Brown study 
attempt to consider the operations of schools. 

Research on "ability" and earnings (e.g., Gnliches and Mason [5b] or the review in 
Gnlichcs [55]) is also related if ability is considered endogenous. 

7 Related research into evaluating the effects of various manpower and job-training 
programs on subsequent labor market performance (e.g., Cain and Hollister [27], 
Ashenfelter [9], Kerachsky ar d Mallar [81], and Kiefcr [82]) offer a similar perspective 
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Although the relationship of schooling and labor market performance 
is central to many policy questions, it is not the only area of interest. Hence, 
studies have also examined the role of education in increasing job 
satisfaction (Duncan [43], Black [16]), in maintaining personal health 
(Grossman [57], Manheim [97]), and in increasing the productivity of 
mothers engaged in household production, as well as the effects of the 
mother's education on the learning of young children (Hill and Stafford [72], 
Leibowitz [93], Lindert [94], Inman [73]). Further, political scientists have 
considered the effect of education on political socialization and voting 
behavior (Campbell and others [29], or the review by Niemi and Sobieszek 
[1 12]), and sociologists have considered the relationship between education 
and criminality. 

While these studies have suggested some gross effects of quantity of 
schooling on other life outcomes, they virtually have never addressed the 
question here — how such outcomes vary in response to differences in school 
programs and operations. Analysis of nonlabor market areas — if refocused 
toward the performance of the educational system and qualitative differ- 
ences in schooling — does have the potential for providing a more balanced 
perspective on educational productivity. Nevertheless, existing studies have 
yielded inconclusive results about the effects of even quantity of schooling, 
let alone the more detailed information. 

A more fundamental shortcoming is the superficiality of the conceptual 
notions of the mechanisms by which education affects skills and later experi- 
ences. Cognitive skills, the chief measure of educational quality, may not be 
the only, let alone the most important, outcome of schooling in determining 
individuals' future success. One might think that more educated individuals 
can accomplish given tasks better or more swiftly, but surely this holds for 
only certain types of jobs. Less education may even be better in jobs 
requiring manual skills or jobs that are very repetitive. One rather 
commonly held presumption is that better educated individuals are able to 
perfoim more complicated tasks or are able to adapt to changing conditions 
and tasks (see Welch [146] and Nelson and Phelps [108]). This hypothesis 
has important implications for studying the productivity and outputs of 
schools through understanding of the mechanisms by which school interacts 
with the work place. Such understanding could provide considerable insight 
into how to measure the outcomes of schooling (or at least where to look) 
and how these outcomes might change with the character of the economy. 
The lack of conceptual clarity holds equally for other potential outcomes of 
the educational system. 

While these studies have in general not considered programmaticdifferences in any detail, 
such consideration could provide additio lal insights into the characteristics of educational 
programs that are important for future success. 



nor] 

1 * '> 



THE JOURNAL OF HUMAN RESOURCES 



The uncertainty about the source of schooling-earnings relationships is 
also highlighted by the recent attention to ''screening" aspects of schooling. 
Schools may produce more qualified individuals or may simply identify the 
more able. The latter view has been the subject of both theoretical and 
empirical treatment by economists and sociologists (Spencc [130], Taubman 
and Wales [136], Berg [14], Thurow [138], Thurow and Lucas [139], Riley 
[119], Arrow [7], Stiglitz [131], Wolpin [152], Layard and Psacharopoulos 
[86]). Most of the attention paid to screening models arises from the implica- 
tion that the social value of schooling may be considerably less than the 
private value (that is observed in earnings relationships) if schools are 
merely identifying the more able instead of actually changing their skills. 
Further, the screening model suggests both possible reinterpretation of the 
historical contribution of education to economic growth (see Denison [41]) 
and revisions of expectations about future returns to schooling. (These 
revisions depend upon the "quality" of the screening function as schooling 
distributions change and the response of firms to any such changes.) 
However, there are also direct implications of the screening model for the 
measurement of educational outcomes and the analysis of educational 
production relationships. In a screening model, the output of schools is 
information about the relative abilities of students, and this would suggest 
that more attention should be directed toward the distribution of observed 
educational outcomes (instead of simply the mean outcomes) and their 
relationship to the distribution of underlying abilities. Further, the interpre- 
tation of some studies, such as those of school dropout rates discussed 
below, might be radically altered, since schools with a higher dropout rate 
might actually be providing better information (higher output) than those 
with lower rates — an interpretation that is very different from that of the 
autho. s of these studies. Unfortunately, no persuasive test has been devised 
to distinguish between a screening model and the more standard "produc- 
tion" model. 

These two views — production and sc. xning — are also not the only 
models explaining subsequent performance. For example. Jencks and 
others [76] argue that luck and personal characteristics (that are unrelated to 
schooling) are the most important determinants of earnings differences, 
Bowles and Gintis [22] believe that earnings differences arise chiefly from 
the existing social structure and that schools adjust to instead ot determine 
subsequent outcomes. While these latter two views are not completely 
convincing, available evidence does not conclusively differentiate among 
these four divergent views/ 

8 The Jencks and others [76] conclusions rest upon the finding that regression analyses of 
earnings include sizable unexplained variation which they label "luck " However* since 
these analyses arc based upon a small number of crudely measured characteristics ( years of 



[1081 1 I .] 



Hanushek 




Referring back to the original question, we find simply a large degree of 
uncertainty about the appropriateness of test scores as outcome measures. 
While the various studies of lifetime outcomes are conceptually very 
relevant to measuring school outputs, they have not been particularly 
illuminating for the study ot school production functions. While it would not 
be particularly surprising if standardized test performance was not highly 
correlated with future success (since the standard test construction 
methodology makes little pretense of relating test performance to any 
external criteria), available empirical evidence is inconclusive about 
whether or not there is some fortuitous linkage between test scores and 
subsequent achievement. 

Nevertheless, performance on tests is being used to evaluate educational 
programs, and even to allocate funds, and there are some pragmatic 
arguments for the use of test scores as output measures. Besides their 
common availability, one argument is that test scores appear to be valued in 
and of themselves. To a large extent, educators tend to believe that they are 
important, albeit incomplete, measures of education. Further, parents and 
decision-makers appear to value higher test scores — at least in the absence 
of evidence that they are unimportant. (Note the continued pressures to 
make scores more publicly available.) 

A more persuasive argument for the use of test scores relates to 
continuation in schooling. Almost all studies of earnings which include both 
quantity of schooling and achievement differences find significant impacts of 
quantity that are independent of achievement differences. 9 This implies that 
achievement differences do not adequately measure all skill differences. 
However, at the same time, test scores appear to have an increasing use in 
selecting individuals for further schooling. Thus, they may relate directly to 



schooling, age. and perhaps measured achievement-test scores or family background), it 
should not be surprising that much is left to be explained Moreover, there is little basis for 
labeling our ignorance (the residuals from a regression analysis) in any particular manner 
Direct analysis of individual earnings over time (Hanushek and Quigley [67]) indicates that 
about two-thirds of the unexplained variation in earnings models represents stable, but 
unmeasured, individual factors. 

I he Bowles and Ginns conclusions [22] in part rest on similar evidence— measures of 
1Q or cognitive ability differences do not explain much of the variation in individual 
earnings. This, combined with an analysis of the historical development of U S. schools, is 
used to support their "social structure" view of earnings determination. However, their 
analysis relates solely to the U.S. economy and the U.S. schooling system There is no 
independent analysis of how differences in social structure affect earnings possibilities, 
even though this is presumably the relevant evidence. 
9 An exception is Hansen. Wcisbrod. and Scanlon [60], but also see the comments on this 
study by Chiswick (33) and Masters and Ribich [98). See also Gintis [51] for a review of 
some ot the literature on this topic up to about 1970. 
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relate directly to the "real" outputs through the selection mechanism. 10 

Finally, a few miscellaneous issues about output measurement should 
be added. First, if one does use test score measurements, there are a number 
of choices, related simply to the scaling of scores. Tests are often available in 
"grade level" equivalent, percentile ranking, or raw score forms, all of 
which provide the same ordinal ranking (except for the possibility of some 
compression of the rankings). Yet, for most statistical work, one wants a 
scale which indicates how different individuals are rather than one that 
simply ranks them. The choice really depends upon the relationship of these 
estimates of output to the subsequent outcomes, which are not known. 11 
Second, there is some movement toward criterion-references tests — tests 
that relate to some set of educational goals. The crucial issue is the develop- 
ment of goals. The previous discussion argues for goals that relate to 
performance outside of schools, but it is not obvious that these goals guide 
much of the current development work. 

B. Multiple Outputs 

Most educational production function studies have analyzed a single output 
or, alternatively, a series of output measures without consideration of their 
interactions. (Exceptions include Levin [89], Michelson [101], Boardman, 
Davis, and Sanday [18], and Brown and Saks [25].) If indeed the educational 
process is best characterized as producing a set of outcomes (say, cognitive 
skills and political awareness) and if there are important interactions among 
them in production, then interpretation of commonly estimated models for a 
single outcome becomes complicated. 

Ordinary least squares (OLS) regression analysis, which is commonly 
used in analyzing production functions for single outcomes, is generally 
inappropriate when there are multiple outcomes that are simultaneously 
produced. In the simplest case, assume that there are two outputs, one of 

10 This argument is found in Dugan [42]. The use for predicting future school performance 
and for selection is also central in Wirtz and others [151]. 

The interpretation of tests and their use in selection may be changing, however. On 
the one hand, at least by newspaper accounts, there is a growing concern about the 
intormation contained in test scores. On the other hand, courts are increasingly becoming 
concerned with the use of tests for selection, particularly when they might have discrimina- 
tory outcomes. For example, in Griggs v. Duke Power Co., 401 US 424 ( 197 1 ), and a host 
ot similar cases, a central issue is whether test performance relates to job performance 

1 1 This is actually just a special case of more general questions about the functional form ol 
production functions (discussed below). Coleman and Karweit [39] argue against use ot 
grade-level equivalents on the basis of their peculiar properties even for answering qualita- 
tive questions. Whether or not tests measure accurately the activities and learning that 
take place in schools has not been considered here. For the purposes here, such concern 
simply deserves little attention if unrelated to subsequent outcomes. 
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which is an "intermediate" outcome (such as attitudes toward school) that is 
not valued itself, and one of which is a "final" outcome (such as achieve- 
ment) that is valued by decision-makers; also, assume that the underlying 
production relationships (the structural equations) are such that, in addition 
to a series of exogenous factors (such as family background and schools), 
attitudes affect achievement and, simultaneously, achievement affects 
attitudes. In such a situation, both structural equations can generally be 
estimated simultaneously (although not with OLS methods). Alternatively, 
it is possible to estimate the "reduced-form" equation for the separate 
outcomes (where the reduced form is the relationship between one of the 
outcomes and the exogenous variables and is found by substituting one 
structural equation into the other). The reduced-form equation, which can 
generally be analyzed with OLS techniques, indicates both the direct and 
indirect impacts (tnrough the other outcomes) of the exogenous variables. 
The reduced-form estimates do not indicate the process by which an 
exogenous variable affects an outcome and may be misleading if one under- 
takes policies that change the structural relationships. However, there are 
no quarrels with the underlying statistical methods or the judicious interpre- 
tation of the results. ]1 Many single-equation analyses can be interpreted 
simply as attempts to estimate reduced-form relationships. 

However, alternative multiple-outcome models are more complicated. 
Consider again the case of two outcomes except now let both be "hnar 
outcomes that are independently valued by decision-makers. With informa- 
tion about the alternative outputs, inputs, and decision-makers' valuation of 
outputs, the structural equations can again be estimated directly. " 
However, with information about only one output, estimation ot the 
reduced form might be quite misleading. u i he e^imated effects of various 
inputs will reflect both the production technology (the effect of each input 



12 Underlying this is a series of complicated statistical arguments and assumptions For 
example, there are the issues of identification, the distribution of the error terms, etc 
There are also specialized forms of the structural models— such as recursive models— 
which can be estimated by OLS. Discussion of these issues along with discussion ot the 
desirability and methods of estimating the structural equations can be found in Hanushek 
and Jackson [65, Chs. 8 and 9). 

Reduced-form estimation still requires some specification and measurement of the 
other equations in the system, which may be difficult For example, with attitude forma- 
tion, little is known about the determinants of attitudes, and clearly experiences outside 
schools (for which data are often lacking) j.re important. 

13 There is, in reality, little information about decision- makers' choices among outcomes 
This makes estimation of the structural equations very difficult, even when data are 
available about the relevant outputs and inputs. 

14 In the general case, the errors in the reduced-form equation will be correlated with the 
exogenous factors through the decision function about different outputs. 
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on the single output) and the choice between outputs, not simply the 
production technology. 

The empirical importance of this issue is generally unknown. It depends 
importantly upon the degree of "jointness" of production, the form of the 
production function, the variance of choices, the underlying decision rules 
for determining choices, and the accuracy of measuring inputs. It is possible 
to construct models where joint production appears extremely important 
(e.g., Brown and Saks [25] develop a model where both mean and variance 
of achievement are valued and where simple reduced-form estimates appear 
quite misleading). However, there is a wide variety of circumstances where 
such issues are inconsequential. 15 Without more information about both the 
range of outputs (and their measures) and the potential decision rules, there 
is little that can be said about this problem. 

Given measures of alternative outputs, it would be appealing to look at 
production functions for "total 0 output. This is essentially what is done for 
production function estimates in other sectors, where market prices are used 
to aggregate outputs. However, these prices are not available for education. 
Even if available, they may be inappropriate since the weights in the 
decision function for outputs may differ from the market prices. ,t} 

Consideration of multiple outputs does suggest that production func- 
tions estimated with test-score measures might be more appropriate in 
earlier grades, where the emphasis tends to be more on basic cognitive skills 
— reading and arithmetic — than in later grades. In other words, these 
outputs appear to be much more heavily weighted than others at earlier 
grades, and therefore the potential problems of multiple outputs are less 
than in later grades. 17 

C. Inputs to the Production Process 

The usual prescription for developing the relevant set of inputs to a produc- 

15 'take a simple example where one is concerned with alternative outputs which are 
independently produced; say, one is concerned with reading ability and sex education and 
they do not interact (i.e. , each does not appear in the other structural equation). As long as 
we have accurate measures of all exogenous variables (discussed below), estimation of a 
single equation may not be affected by the decision process which weights the two outputs. 
Alternatively, if the relative weights placed upon the two outputs vary dramatically— say, 
cognitive skills are emphasized much more than other skills— the problems may be 
empirically insignificant. 

16 This is not an issue in studying competitive industries where it is assumed that managers 
maximize profits; thus, the weights in the output decision function are simply the market 
prices. 

17 Note that all production function studies have been conducted for elementary and 
secondary schools. In postsecondary education, few people believe that test scores 
adequately measure outputs. 
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tion process is to find an engineer who will describe the technical character- 
istics and specifications of the process. When considering education, the 
"engineers" are usually thought to be learning theorists. Nevertheless, 
almost all educational analyses begin with laments about how we do not 
have any learning theory that is suitable for guiding input-output analyses. 
In reality, engineers give little guidance to the development of production 
functions developed for any sector. Typical production function studies, 
say, for manufacturing industries, incorporate two or possibly three inputs 
(capital, labor, and possibly education level of labor), and few would argue 
that engineers had much to do with this specification. This set of inputs does, 
however, match the pedagogical models and, while there are some minor 
debates about such issues as the measurement of capital, generally recer es 
limited attention. 

In education, the relatively fixed input of labor and capital (i.e., one 
teacher per classroom with a relatively small variance in class size) implies 
that this simple description of inputs could explain little. Somewhat ironically, 
because educational studies have attempted to provide much more detail 
about input differences, they have been faced with much more criticism 
about the specification of the inputs. 1H 

Part of this criticism is explained by the fact that input specification has 
not received much attention in many past analyses. There is little conceptual 
clarity, and the choice of inputs seems, sometimes explicitly, to be guided 
more by data availability rather than any notions of conceptual desirability. 
For example , nowhere in the Coleman Report can one find a statement c < an 
underlying conceptual model. At times such a model seems implied, but the 
statistical analyses do not seem to relate to the implied model (see Hanushek 
and Kain [66]). 

Conceptually, a model such as equation ( 1 ) seems generally acceptable: 

(i) a„ = /(£/'>, pyK /,) 

where, for the ah student, A u = achievement at time t\ B, ii] = vector of 
family background influences cumulative to time /; P/0 = vector of 
influences of peers cumulative to time /; S t b) = vector of school inputs 
cumulative to time t\ and I, = vector of innate abilities. The development of 
this model and background analyses entering it are discussed elsewhere 
(Hanushek [61]), and this discussion will only highlight important issues. 

In the abstract, it is difficult to quarrel with this specification; controversy 
enters only when more detail about the definition and measurement of 
variables and the form of the functional relationship are introduced. The 
first important point is that the inputs are those that are relevant to the 

18 Again, part of the attention to model specification relates to the very different purposes 
behind educational analyses and analyses of other sectors (see above) 
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individual student. Additionally, the model portrays the educational 
production relationship as cumulative— past inputs have some lasting 
effect, although the value in explaining output may diminish with more 
distant inputs. A corollary to this last point is that, without tairly strong 
assumptions about the dynamics of education—that is, the time paths of 
adjustment to change— the data requirements are huge. 

In part to circumvent some of the data requirements (and in part 
because of other considerations discussed below), an alternative version of 
this model has sometimes been analyzed. If equation (1) holds at different 
points in time, say, a past period /*, we can consider the change in 
achievement between / and t* as in equation (2): 

(2) A a = rW" r \ P^K J/'"'**, /,, A lt .) 

where the inputs are measured over the period t* to /. ,y This formulation, 
sometimes referred to as "value added" specification, apparently lessens 
the data requirements, but it does so at the expense of some additional 
assumptions about the relationships (discussed below). 

Consider now the empirical implementation of these models. Most 
analyses are purely cross-sectional and include only contemporaneous 
measures of the inputs. No studies have adequate measures of initial endow- 
ments (or "learning capacity"). Many educational inputs (e.g., family 
educational inputs) are not measured directly, but instead are proxied by 
other observable attributes (such as socioeconomic background of the 
family). Little attention is given to the dynamic structure, that is, how the 
effects of different inputs cumulate. The relevant inputs (e.g., school 
factors) are often measured with considerable error. 

The divergence of the conceptual model and the empirical models 
which have been estimated means that interpretation of the empirical results 
often requires a series of implicit assumptions, some of which are very 
dubious. The remainder of this section attempts to make explicit the most 
important assumptions underlying the empirical analyses. 

The most consistent and obvious divergence of the empirical models 
from the conceptual models is the lack of measurement for innate abilities. 
In fact, there is little clarity about what should be measured in this term (/,). 

19 This equation results from simply subtracting equation ( U for time t* from equation ( 1) for 
t However, instead of analyzing A u - A, r as the dependent variable, A lt * is put on the 
right-hand side. There arc three reasons for doing this: (1) empirically. and/V may 
well be different tests with different scaling, etc.; (2) levels of starting achievement (A lt *) 
may influence achievement gain; and (3) correlated errors in achievement measurement 
may suggest such a formulation (Cronbach and Furby [40]) However, the latter argument 
suggests that further ccirections for errors in the exogenous variables— probably based 
upon test-reliability measures— arc also needed since such errors, even if they have zero 
means, will yield inconsistent estimates. 
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Presumably, it includes "learning capacit; but this is not well defined. 

In a regression framework, the effect of omitting an important variable 
is bias in the estimated regression coefficients. The importance (size of bias) 
is related both to the strength of the variable on achievement and the 
correlation of the omitted variable with other included variables in the 
model. If innate abilities were uncorrected with all of the included 
variables, the only effect would be to increase the residual variance, and 
there would be no bias of the othercoefficients estimated. However, there is 
some evidence that these correlations are not zero. If /, is related to IQ, we 
know that, in particular, IQ is correlated with family background (either 
through genetics or environment). For example, Scarr and Weinberg [125] 
find that 10 to 30 percent of the variation in IQs of adolescents is explained 
by measured family characteristics. 20 Further, the correlations for younger 
children are higher. This implies that the omission of innate abilities 
probably biases upwards the estimated impact of family background on 
achievement. 21 At the same time, it is plausible to assume that biases in 
other parts of the model will be considerably less, particularly in the case of 
school inputs. The correlations between innate abilities and school 
attributes, after allowing for family background factors, is likely to be 
small."- Likewise, the importance of these omitted factors is lessened if the 
estimated model is equation (2), since "level" effect would be included in 
A tt * and only "growth" effects of innate abilities would be omitted. (See 
Boardman and Murnane [19] for a discussion of potential biases in alterna- 
tive specifications.) 

The next major category of empirical problems is the accuracy of 
variable measurement, a problem which occurs in several different forms. 
Frequently, only contemporaneous measures of the exogenous variables are 



'. J The reported numbers are R 2 s for regression equations which include family attributes 
(c g., mother's and father's education, etc.) on children's IQs m biological families The 
range in R*s basically arises from the inclusion of parents* IQs For adoptive families, the 
/?~s range between .02 and . 16. 

2 1 The bias is complicated in models with many exogenous variables , in that case it depends 
upon the sample partial correlations of the omitted variables on ail of the exogenous 
variables. Plausible assumptions about the partial correlations, however, indicate an 
upward bias in family background coefficients. For details of biases, see Hanushek and 
Jackson [65, Ch. 4). 

22 Innate abilities are probably positively correlated with school attributes and peers because 
higher SES families generally live in relatively homogeneous neighborhoods and because 
they also select, or demand, higher quality schools. However, the concern is not the simple 
correlations, but instead the correlations after controlling for family background differ- 
ences; these are likely to be considerably smaller. On the other hand, the existence of 
ability-tracking might increase the correlations of innate abilities and school inputs if 
tracked students systematically receive different school inputs (sec Rosenbaum [124] and 
Alexander and McDill [3] on tracking and school inputs). 
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available, implying that the cumulative variables are generally measured 
with considerable error. Even if the errors of measurement have a mean of 
zero, the coefficients will be biased; the amount of bias is roughly proportional 
to the variance of the measurement error relative to the variance of the true 
variable (see Hanushek and Jackson [65, Ch. 10]). In this case, however, the 
errors of measurement for background factors (and the biases in these 
coefficients) are probably less than for other factors, since current measures 
ot backgrounds give a better picture of historical factors than either current 
measures of peers (because of migration or changing of schools) or current 
measures of school inputs. These measurement error problems are 
undoubtedly more severe in models such as equation (I) than for those in 
equation (2) v/here the relevant history is back to /* rather than to birth. 

The biases from "historical" measurement enors are probably most 
severe for schoofing inputs and, to a lesser extent, for peer influences.- 1 
Common "contemporaneous" measurement errors probably also impact 
most severely on school inputs. Much analysis has tried to capitalize on 
readily available school data — data produced routinely for administrative 
purposes. Typically, these data provide measures of average teacher or 
school characteristics, but are not linked to individual students, as called for 
in the conceptual model. In fact, schools are often very heterogeneous 
institutions with considerable intraschool variance in staff and programs 
This problem becomes more acute at later grade levels where average 
characteristics may give very misleading indications of the actual inputs to 
any given student. 

Frequently, educational production functions are interpreted as if the 
included variables are conceptually and accurately measured, when m fact 
this is not the case. However, the severity of such problems differs signifi- 
cantly across studies and clearly explains part of the apparent inconsistency 
in findings. Moreover, within most studies, measurement errors are 
probably most important in the case of school inputs, leading in general to 
underestimates of the importance of school inputs.* 4 

23 At the elementary school level, current measures of peers may not in general be too bad. 
i e . may not have particularly large measurement errors However, the problem worsens 
significantly at later grades where it is common to collect students from a variety of schools 
at a given junior high or high school Nevertheless, for some analyses such as the effects of 
racial composition of peers, the situation is more complicated, even at the elementary 
school level. Commonly, schools observed to be well integrated (those with. say. 20 to 60 
percent black students) at any point in time are actually quite unstable— thoy are going 
through a transition period. This implies that the current racial composition, particularly in 
the midranges. may not accurately reflect historic racial composition Similarly for schools 
where the racial composition has recently changed as with redisricting or busing, the 
current situation may be a poor indicator of the historic situation. 

24 Background characteristics are generally measured by a variety of socioeconomic variables 
Conceptually, the variables should measure the direct learning provided in the home along 
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Typically, the school inputs used in empirical analyses include objective 
measures of teacher characteristics and schools such as education levels, 
teacher experience, and age of school buildings. Some also include more 
detailed aspects of teachers such as undergraduate majors or teacher verbal 
ability, which can be interpreted as simply attempting to measure 
homogeneous ''quantities" of inputs. While these fit clearly in the conceptual 
framework, consideration of another set of perhaps important variables 
(such as measures of the organizational context of schools and the educational 
process itself) introduces a set of conceptual problems with the production 
function terminology and framework. Typically (outside of education), 
measures of organization and process are seen as irrelevant in estimation. 
Production functions are interpreted as the relationship between inputs and 
outputs mutatis mutandis. Information about production possibilities i$ 
essentially viewed as being publicly available in the form of scientific ar 
engineering knowledge, and production processes are reproducible through 
blueprints and machinery. The possibility of dynamic choices in process 
made by the actors in production is not considced, and the choice of "best" 
process is assumed automatically made after selection of inputs. While the 
appropriateness of this framework is open to question in a wide number of 
instances, it is particularly questionable in the case of education. 25 

In the education context, there are two separable classes of issues. First, 
there are observable "macro" organizational and process characteristics of 
the school (such as class organization, curricula, departmentalization, 
length of the school day, etc.) which represent clearly defined and repro- 
ducible educational practices. Second, there are aspects of the process which 
are difficult to disentangle from the characteristics of individual teachers 
(such as classroom management, methods of presenting abstract ideas, 
communication skills, etc.). 

with attitudes, etc. Past research suggests that the learning environment in the home is 
highly correlated with SES. Empirically, the models do not appear to be sensitive to the 
precise measure of SES used. For policy considerations, however, it is important to note 
that the SES measures are only prox.es for some more fundamental characteristics, and it is 
unlikely that changing the measured characteristic (say, current income) will have much of 
a short-run effect on achievement. At times we might be interested in determining whether 
there are any immediate effects of changing the measured SES of a family (say, through a 
negative income tax). However, it should be recognized that this is a reduced-form 
relationship (where there is another structural equation which relates learning environ- 
ment in the home to current attributes) and that policies aimed at SES simply might alter 
the structural relationship*. 
25 The importance of "process" choice is apparent in any activities which involve individual 
"skill'*; e.g., the differences between chefs is probably not just a difference in cookbooks, 
or blueprints. Organizational issues have been treated to some extent such as models of 
learning-by-doing, but in general have not been very well developed. See Nelson and 
Winter [1 1 1] for a more general critique of ths shortcomings of the engineering view of 
production functions. 
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The first set of factors can readily be accommodated in the conceptual 
framework (although the actual empirical implementation may be more 
difficult). While decision-makers may not accurately perceive the impact of 
various macro organization and process choices and thus may not make the 
best choices, production functions can be estimated conditional upon these 
factors. In fact, there has been some, although not extensive, investigation 
along these iines. 2 * 

However, the second type of process effect creates more serious 
problems — both for the application of the general conceptual model and for 
the interpretation of any estimated effects. Many educational decisions are 
,k micro" ones made by the actors themselves — mainly teachers. These are 
both difficult to observe and measure and, quite possibly, not easily 
reproduced. As a shorthand description, these factors will be referred to 
simply as "skill" differences. Once the possibility of skill differences, or 
embodied process in individuals, is introduced the language — if not the 
conceptual framework — of production functions begins to fail. It is even 
difficult to define just what "maximum pos ible output" might mean since it 
is difficult to specify what the "homogeneous" inputs are. 

There is some indication that these latter inc^vidual differences are 
quite important. The explanation of the apparent insignificance c* n-acro 
process variables in Armor and others [6] is the great variation in implemen- 
tation of overall process decisions at the classroom level. This is also 
supported by detailed analysis of the implementation of innovative 
techniques at the classroom level (see Berman and McLaughlin [15]). 
Finally, more direct analysis indicates that roughly only half of total teacher 
performance can be explained by measured teacher and classroom 
attributes.- 7 

26 F >r example, Armor and others [6] test a variety of macro organization and process 
variables including techniques of reading instruction, time spent on reading, team- 
teaching, open classrooms, and variety of materials used, but found no significant impacts 
on achievement. In empirical work, organizational forms or process which represent 
simple, well-defined choices (such as the use of a given standard curriculum) are easily 
included. However, more complicated or less well-defined factors (such as departmentali- 
zation which depends not only upon the organization, but also the particular teachers) 
present more formidable problems that are related to the second category of process 
effects (below). 

There have also been a large number of direct investigations of alternative processes 
or organizational forms, generally following experimental approaches and thus having a 
more narrow focus. See, for example, Jamison and others [74], Carpenter and Hall [31], 
Garfinkel and Gramlich [49], Gramlich and Koshel [52], Cicirelli and others [35], Armor 
[5], Barnow and Cain [12], Kiesling [84], Fox [48], and Rivlin andTimpane[121) These 
analyses have fairly uniformly shown few achievement effects. 

27 This evidence comes from an analysis which first estimates the "value added" of individual 
teachers through individual teacher dummy variables (Hanushek [61]) and then attempts 
to explain these estimated differences by measured teacher and classroom variables 
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Recognition of skill differences has implications for discussions of 
"efficiency in production" (discussed below). It also alters our interpreta- 
tion of teacher and school inputs. It is still reasonable to consider the impact 
of measured attributes of teachers, since many school decisions such as 
hiring and salary are based upon a set of these characteristics. However, the 
estimated impact of these measured attributes, following the above 
discussion, indicates the ability either to predict or to develop more skilled 
teachers. For example, the almost universal finding that more education of 
teachers has no impact on achievement can be interpreted as indicating that 
teacher training institutions do not, on average, change the skills of 
teachers. Or, alternatively/ the frequent finding that ciass size doesn't affect 
achievement may arise ffom complicated (and unobserved) interactions 
with teacher process choices; therefore, while it is possible that smaller 
classes could be beneficial in specific circumstances, it is also true that, in the 
context of typical school and teacher operations, there is no apparent gain. 

One implication of this discussion is that more effort should be devoted 
to understanding and measuring both the macro and micro organization and 
process characteristics of schools. This represents a distinct break from the 
tradition of production function analysis. There is no presumption that 
schools systematically choose the best process given the inputs; thus, 
estimates of education "technology" must be made conditional upon the 
chosen macro organization and process characteristics. At the individual 
teacher level, the estimated impact of teacher characteristics can be thought 
of as reduced from coefficients which include both direct effects (say, of 
teacher experience) and indirect effects through systematic choice of micro 
process. 

Z). Efficiency in Production 

One important issue is whether or not schools are efficient in production. 
This has impor nt policy implications since inefficiency indicates the 
possibility of increasing school outputs with no additional inputs. However, 
there is a prior statistical and interpretive issue: Since estimation is based 
upon the observed behavior of schools, the estimated relationships may not 
trace out the production frontier if schools are not producing the maximum 
output for given inputs. In such cases, the relationships will describe average 
behavior which may not be particularly useful in predicting how changes in 
inputs would affect outputs. Past discussions of efficiency have nevertheless 
been confused because both the concepts of efficiency being applied and the 
appropriate ones in this case ha/e not been clear. 

Traditionally, two concer ts of efficiency are considered. Economic 
efficiency refers to the correct choice of input mix given the prices of inputs 
(and the production function). Technical efficiency refers to operating on 
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the production frontier, that is, maximizing output for a given set of inputs. 
Past efficiency discussions have blurred these two concepts and, more 
importantly, have neglected consideration of how expanding the usual 
concept of production functions to recognize both macro organization and 
process choice and skill differences of inputs affects efficiency discussions. 

Two arguments have been used to support the assertion that schools are 
technically inefficient- 28 First, .educational decision-makers are apparently 
not guided by incentives to maximize profits ot to conserve on costs. Second , 
they might not understand the production^ process and therefore can't be 
expected to be on the production frontier. The first argument, while raising 
the possibility of economic inefficiency, does not necessarily imply being off 
the production frontier unless resources are also, wantonly squandered. 29 
The focus of the second argument generally appears related to the 
importance of macro organizational and process choices. The relevance of 
these can be analyzed, and, importantly, their presence does not 
significantly uttx the interpretation of empirical analyses as production 
functions. 30 Direct analyses of these, factors, while not completely 
conclusive, do not indicate their overwhemling importance (see above). 

The possibility of skill (or ''embodied process") differences among 
inputs to schooling introduces a new dimension to the efficiency discussion. 
The standard conceptual framework indicates that, if two production 
processes are using the same inputs, any systematic difference in outputs 
reflects inefficiency. However, the concept of skill differences simply 
recognizes that individuals with the same measured characteristics make a 
series of important production decisions (reflected in behavior, process 
choices, etc.) that are difficult to identify, measure, and model. Therefore, it 
is not surprising that the same measured inputs yield variations in output, 
but at the same time it is difficult to label such observed variation as 
efficiency differences. 

Introduction of skill differences does not, however, eliminate the 
usefulness of a general production framework. For many purposes, the 



28 In most production function estimation outside of education, it is assumed that profit 
motivation dictates efficiency. An exception is Leibcnstein [88] who argues that production 
inefficiency is more common than generally assumed. 

29 In fact, if ali schools were economically efficient and input prices were the same 
everywhere, it would not be possible to estimate production functions. The production 
function is, in education and in other areas, identified either by input price variations or by 
economic inefficiency. 

30 As suggested above, production functions can be estimated conditional upon macro 
organization and process choice. Comparison of the "best" technology with alternative 
ones then provides estimates of this source of inefficiency. Clearly, if these factors are 
important and correlated with observed input usage, estimation not considering them 
would be misleading (i.e., would describe average relationships). 
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desired information is what aspects of teaching can be replicated (or 
predicted) in different situations. Most research has concentrated upon 
systematic, measurable characteristics— the reduced-form models of 
teacher effects— and these estimates do indicate what can be replicated in 
the absence of shifts in the underlying structure. 

Some research has in fact estimated the total effects of individual 
teachers without regard to actual measurement of underlying attributes and 
confirms that important dimensions of teacher quality are not captured by 
measured teacher attributes. 31 An important sidelight of such investigations 
is that decision-makers might be able to identify underlying skill differences 
amon^ teachers with fair accuracy. Murnane [106] found that principals' 
evaluations of teachers were highly correlated with estimates of total effec- 
tiveness. For many purposes, this is almost as good as the ability to identify 
differences ex ante. 32 

Finally, concern a'uout technical inefficiency has led to some, basically 
nonotatistical, estimation of the production frontier. 33 Besides assuming 
accurate measures of both inputs and outputs, this analysis appears internally 
inconsistent: it is motivated by the perceived uncertainty about the produc- 
tion process, yet assumes that the researcher knows and measures all of 
the inputs to the production process. Further, the possibility of nonreprodu- 
cible skill differences is totally neglected. 

3 1 Several studies have been conducted without resort to just measured teacher characteristics 
such as experience, education, etc., but have used general covariance analysis schemes to 
estimate the "totar* effect of teachers— in terms of both measured and unmeasured 
characteristics (see Hanushek [61], Murnane [106], or Armor and others [6]). Similar 
analysis (Hanushek [63]) for "skill" differences among principals, while somewhat limited, 
show no differences. 

32 This indicates that accountability of individual teachers is not impossible. There is some 
concern, however, about just how this information might be used for educational policv. 
The principal observations used by Murnane took place in a situation where they were not 
actually used for any decisions. (For interpretation of these results, it should be pointed out 
that the evaluations were correlated with the value added of teachers and not simply the 
overall level of student achievement. Thus, it is not the case that principals* evaluations 
just reflected observed total performance of students.) 

33 Orte proposed technique is linear programming analysis (Aigner and Chu [ l], Carlson [30], 
Levin [91]). This technique effectively locates the "most productive** schools and places 
the production plane through just these observations. In fact, the estimation is based upon 
a small set of observations (equal to the dimensionality of the inputs). 

A related inquiry by Klitgaard and Hall [85] attempts to identify "unusually effective" 
schools, i.e., schools which perform significantly better than would be expectsd on the 
basis of SES composition of students and community characteristics. Such schools arc 
identified on the basis of residuals from achievement regressed on SES of students. 
However, if school attributes are important and correlated with SES, this approach can be 
very misleading— particularly when important "school variables" are subsequently 
identified. The residuals simply cannot be interpreted in this manner because they will be 
biased and inconsistent. 
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E. Miscellaneous Issues and Nonissues 

A number of more detailed criticisms of estimated educational production 
functioris have also surfaced; This section briefly discusses the major ones 
(functional form, level of aggregation, selection effects, tnulticollinearity 
among inputs, and general statistical methodology), along with their effect 
on the interpretation and conduct of production analyses. 

1. Functional Form. Educational production functions have been 
estimated in kyariety of forms, although most frequently variants of linear 
or logarithmic models.* 1 Conceptually, there is little guidance about func- 
tional form!^ Empirically, the issue of functional form appears to be a 
second order problem, since 'distinguishing among alternative functional 
forms is often impossible. The point is a simple one: within a limited range of 
variation, many functional forms look very similar. An important 
corollary, however, is that over wider ranges of variation, different func- 
tional forms may yield very different results, implying that predictions based 
upon changes that are far from current observations may be perilous. For 
example, while variations in class size, within the limited ranges observed, 
have little apparent effect on achievement, this is not necessarily the case for 
radically different class sizes: 

2. Level of Aggregation. While the conceptual model is at the 
individual student level, much analysis — relying upon data collected for 
other purposes — is actually conducted at a more aggregate level, say, the 
school or district level. The effects on the estimates of such aggregation 
depend crucially upon the nature of educational relationships. 

In the simplest case, when the production process is approximately 
linear in the same parameters for all students, OLS estimates on the 
aggregate data will be unbiased, although probably less precise than if 
individual data were available. 36 In more complicated situaions, aggregation 



34 The variety is actually fairly large. Various authors have considered various stratifications 
(such as by race or socioeconomic background); e.g. , Hanushek [61], Coleman and others 
[37], Smith [128]. Others have conducted general covariance analyses which allow 
unconstrained functional forms in terms of underlying descriptois of teachers (Hanushek 
[61], Murnane (1063). Finally, a variety of interactions among variables have been 
introduced (Winkler [149], Summers and Wolfe (133, 134]). 

35 Linear models imply independence of the various inputs and instant marginal products, 
while logarithmic models allow declining marginal products but constrain the form of 
interactions of variables. Much production function analysis outside of education has 
centered on the properties and usefulness of alternative functional forms. See, for 
example, the work on Cobb-Douglas forms in Hildebrand and Liu [71]; constant elasticity 
of substitution models in Arrow and others [8]; transcendental log functions in Christensen, 
Jorgenson, andLau(34]; and generalized production functions in Hanoch[59]. However, 
as explained previously, these analyses are difficult to translate for education. 

36 More generally, even with a distribution of parameters across students, OLS estimates of 
the mean parameters are unbiased as long as the parameters are uncorrected with the 
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has less innocuous effects. For example, if two groups of students — say, 
blacks and whites — have different production relationships where the 
differences are not easily parameterized, estimation with aggregate data 
yields "average" coefficients which depend upon the weighting of the two 
groups in the sampled observations and which are difficult to interpret. (See 
Stodolsky and Lesser [132] on the possibility of racial differences.) 

Nevertheless, probably the most serious "aggregation" problem is 
really one of errors of measurement. The researcher frequently has 
individual data about students (say, achievement and family background), 
but only aggregate data about schools. The temptation is to use all available 
data by mixing individual characteristics with aggregate school data. 
However, the school factors relevant to any individual may differ signifi- 
cantly from the average. (Consider, for example, the situation in a large 
comprehensive high school.) Here, aggregation generally helps; the errors 
in measurement for a model of average achievement and average charac- 
teristics are almost certainly less than with individual achievement and 
average school characteristics. 37 

Aggregation as an errors-in-variables problem may be quite pervasive. 
Even with data on individual classrooms, the internal allocation of time and 
attention of students implies that each student might receive different inputs 
(see, for example, Garner[50], Wiley and Harnischfeger[148], and Karweit 
[79]). We don't have a good understanding of the importance of such varia- 
tions. Analysis of classroom composition effects can be interpreted as 
attempts to model this, but the results about the importance of peer 
compositions are mixed. :w As discussed previously, the accuracy of 
measurement of inputs is an extremely important topic, and probably much 
more important than just aggregation. 

3. Selection Effects and Causation. For policy purposes, information 
about causal relationships between school factors and achievement is 
needed. However, such information (about the direction of causation) 
cannot come directly from the observed data and correlations, but must be 
introduced from a priori information about the structure of the overall 



exogenous variables and have the same mean across students; see Swamy [ 135] for discus- 
sion of such "random" coefficient models. With both random coefficients and simple 
aggregation, 'he efficiency oi estimation can generally be improved with techniques other 
than OLS (see Hanushek and Jackson [65]). Such results do not, however, hold when there 
are important nontinearities in the production process. 

37 This errors-in- v ariables argument is a major criticism of the original Coleman work (see 
Hanushek and Kain [66] and Hanushek [61]). It is particularly damaging in the analysis of 
variance framework of Coleman and others [37]. 

38 Compare Hanushek [61] with Henderson and others [70] for peer estimates at the 
class: oom level. See also Murnanc [106]. 
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model. The primary concern in the production function setting is the effects 
of teacher selection and assignment mechanisms. 

Consider the simple case of an observed positive relationship between 
teacher experience and student achievement (holding other factors 
constant). Depending upon the mechanism by which teachers are assigned 
to schools, this need not imply that increasing average experience levels in a 
school will increase achievement (i.e., that there is a causal relationship 
running from experience to achievement). If, for example, more senior 
teachers were allowed to choose their schools and teachers had a preference 
for teaching higher achieving students, then achievement would, at least in 
part, "cause" experience; and a policy change that increased experience 
would not yield the (full) effect on achievement expected from the estimated 
relationship. Other, and more subtle, selection effects might also occur; 
more educated or more intelligent teachers may, through their own efforts 
or the direct assignments of principals, be placed in "faster" classes. 

The situation is really another case of simultaneous equation bias. The 
importance of these effects depends upon the importance of achievement in 
determining assignments of different types of teachers, and there has been 
little direct analysis of this. The appropriate solution is estimation of the 
simultaneous system (see fn. 12), but this has not been done. 

Greenberg and McCall [53] analyzed a single urban school system in the 
early 1970s and concluded that race and socioeconomic background of 
students were systematically related to the selection and transfer of teachers 
with different education and experience levels. However, Murnane [107] 
suggests, from analysis of a different school system, that declining enroll- 
ments and the subsequent surplus of teachers have led to a much greater 
reliance on institutional rules and much less on individual teacher 
preferences (which was the hypothesized mechanism in Greenberg and 
McCall [53]). 

Nevertheless, the potential problems arise from achievement affecting 
selection, and not from family background, race, or other factors that are 
included on the right-hand side of the estimated model affecting selection. 
In the latter instance (which would be a recursive structure), even though 
some correlation among the right-hand side variables may be induced by this 
mechanism, there are generally not serious problems; without other such 
selection effects, the estimated relationships with achievement can plausibly 
be interpreted as causal relationships. Clearly the severity of the problem is 
related to the structure of the model estimated and in many instances is only 
serious in the presence of fairly subtle selection mechanisms (particularly in 
a "value-added" specification). 

4. Muliicollinearity. Since discussion of multicollinearity in educational 
research by Bowles and Levin [24], it is taken as an almost ever-present, but 
lamentable, fact of life in most estimation. In fact, it is the first item 
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discussed under "Analytical Problems" in the review of production function 
research by Averch and others [11]. Fortunately or unfortunately, multi- 
collinearity does not appear to be the villain it has been made out to be, 
although it may partially explain some of the apparent inconsist ncies in 
existing research. 

The statistical story is that disentangling the separate effects of 
exogenous variables which are very highly intercorrelated can be difficult. 
Further, in the usual case of positive intercorrelations, the parameter 
estimates themselves will tend to be negatively correlated so that quite 
commonly a coefficient has the "wrong sign" because of the correlations of 
variables. 

Nevertheless, the importance of multicollinearity is probably over- 
rated. All correlations of exogenous variables do not have serious conse- 
quences, and all low r-statistics and wrong signs are not the result of multi- 
collinearity. Right-hand side variables are often called independent 
variables, but this does not imply that they cannot be correlated. In fact, 
multiple regression analysis is used because there are correlations among the 
"independent" variables. 

The importance of multicollinearity depends cmcially upon the statistical 
methods used. 19 Further, the diagnosis of prot^ms is sometimes difficult. 40 
In micro or individual level data, such as those frequently used, the real 
estimation problems caused by multicollinearity are almost certainly not as 
severe as citations would indicate. This problem could, however, explain 
some of the variation in findings across studies since, in the absence of an 
agreed-upon theory of what variables should be included in the models and 
how they should be measured, researchers frequently determine model 
specification on the basis of coefficient significance tests. Variations in the 
sample intercorrelations will yield variations in model specifications under 
such criteria. 



39 Bowie > and Levin [24] were correct in the importance of multicollinearity for the Coleman 
analysis. However, the important feature in that criticism, that is often overlooked or 
misunderstood, is the interaction between the choice of statistical analysis (analysis of 
variance) and the correlation of exogenous variable*. In the analysis of variance 
framework, any correlation causes trouble. See also Hanushek and Kain [66] and the next 
section of this paper. 

40 Multivariate regression analysis is designed to take into account correlations among the 
exogenous variables. If the exogenous variables are uncorrected, bivariate regression (or 
simple correlations) will suffice. However, with very high levels of correlation among the 
independent variables, the coefficient estimates become imprecise; in the extreme, with 
perfect linear relationships among exogenous variables, estimation of the independent 
effects of the variables is simply impossible. Diagnosis is often difficult because multi- 
collinearity often causes high estimated vanancesof the coefficient, but such hi$h variance 
can also result from including variables which are unimportant in determining achieve- 
ment For further discussion and diagnostic aids, see Hanushek and Jackson [65j. 




11261 



THE JOURNAL OF HUMAN RESOURCES 



5. Statistical Methods. Throughout the previous discussion, the focus 
has been on the estimation of the parameters of the production process. An 
alternative focus, and a close statistical relative, is contained in analysis of 
variance (see Coleman and others [37]). In this methodology, the observed 
achievement variance is decomposed into that arising from different 
sources. Suffice it to say, this methodology is often inappropriate for the 
questions under consideration (see Cain and Watts [28]), and simply raises 
further, added interpretive questions with no apparent gain. 41 

IV. AN ASSESSMENT OF WHAT WE KNOW AND 
WHA T WE SHOULD DO 

The discussion to this point has indicated a wide range of problems — from 
conceptual problems to technical and esoteric interpretive issues. However, 
the overall analytical power of the production function framework, which 
integrates observations about various aspects of schools, should not be lost. 
Clearly, more detailed theoretical and empirical analyses focusing on 
specific aspects of the production process (such as the mechanisms of peer 
influences or the organizational and decision-making framework of schools) 
have been conducted outside of the context of production function analysis. 
However, they generally suffer from one of two problems: Either they 
concentrate exclusively upon a given attribute of schools or the learning 
process, or they consider the relationship between a particular attribute and 
student outcomes to the exclusion of other attributes that simultaneously 
affect outcomes. While these studies are useful in clarifying the important 
attributes of schools and in describing what goes on in and around schools, 
they are considerably less useful in considering alternative policies with 
respect to schooling. The strength of the production function studies lies in 
their policy relevance through investigation of the independent influences of 
various factors — student characteristics, teacher and school inputs, and 
other environmental attributes — on performance of the schooling system. 

Further, this discussion should not be interpreted as implying that we 
have not learned anything from past research. In fact, there are some 
startlingly consistent findings that are quite robust to some of the problems 
mentioned, and many apparent inconsistencies are significantly reduced 

4 1 The results of rtese techniques are sample specific; that is, they depend importantly upon 
the observed sample variations in the dependent and independent variables. Further, 
some of the variation, maybe a significant portion of it, can be accounted for jointly by the 
various independent variables (see Hanushck and Kain [66]), which leaves the choice of 
arbitrarily allocating the joint variation (as in Coleman and others [37]) or simply 
identifying its importance (as in Mayeske and others [99]). In neither case can one indicate 
the expected effect of changing given inputs. 
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when the most obvious specification, estimation, and measurement errors 
are taken into account. Not only are these problems not uniformly 
distributed across studies, but also it is often possible to determine the 
direction, if not the plausible magnitude, of many such biases. 

In terms of consistent findings, differences in family socioeconomic 
background without question lead to significant achievement differences. 
Socioeconomic status is interpreted as a proxy for quality of the home 
learning environment. How much arises from factors malleable in the short 
run (current income and consumption, physical surroundings, current 
attitudes, etc.) and hov/ much arises in the longer run, less malleable 
attributes (such as parental education, patterns of family rearing, etc.) is 
unknown, although longer run attributes are probably more important. 42 

Second, there is conclusive evidence that differences among schools 
and teachers are important in achievement. Schools simply do not have 
homogeneous impacts on students. Yet, the identification and measurement 
of specific teacher or school attributes which are important is much less 
certain. The variation across studies in specific characteristics that appear 
important in part reflects ^comparabilities in underlying samples and data. 
For example, measures of "general intelligence" of teachers appear 
consistently important when considered (see Hanushek [61]), but are most 
frequently unavailable. Part of the variation undoubtedly also reflects 
teacher "skill" differences that are difficult to identify, measure, and model. 
Nevertheless, there is some evidence that school officials can identify more 
productive teachers. While the inability to disentangle the attributes of 
effective teachers indicates difficulty in selecting "good" teachers ex ante or 
in improving teacher productivity, the f.ict that good teachers can be 
identified ex post indicates that schools can be improved by appropriate 
promotion and allocation decisions. 

Third, there is quite conclusive evidence that schools are economically 
inefficient; that is, they do not employ the best mixes of inputs, given input 
prices and their apparent effectiveness. 4:1 The possibility of inefficiency in 

42 Analysis of this issue, which deserves added attention, requires longitudinal data on 
individuals, preferably where current family characteristics change significantly. (Data 
generated from the negative income tax experiments in Gary, Seattle, and Denver, for 
example, seem appropriate). Further, it may be even more useful to actually model 
behavior within households. Nevertheless, simply because a current background measure, 
say, family income, appears important, one cannot conclude that changing this will 
increase achievement in the short run since it may proxy other attributes (that aren't 
changed) and may be contaminated by individuals* ability effects (set above). 

43 The evidence on economic efficiency comes from two, almost universal, findings of no 
consistent or significant relationship: (1) between achievement and expenditures per pupil 
(either instructional expenditure or total expenditures); and (2) between achievement and 
specific purchased inputs (teacher experience, teacher education levels, class size, and 
administrative/supervisory expenditures). Teacher experience appears somewhat produc- 
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school choices of organization and process, while less conclusive, does not 
however appear overwhelming. 14 

Fourth, there are significant differences in production functions by race 
and, perhaps, family background; that is, school resources interact 
importantly with the background characteristics of individuals. 4 ' On the 
research side, this implies exercising considerable care in modeling efforts, 
particularly when forced to use aggregate data. On the policy side, it 
indicates that more attention should be given to the internal allocation of 
resources. If inputs were equally effective for all students, shuffling teachers 
(without changing the pool) is a zero-sum game— winners balance against 
losers; this is not the case when input effectiveness varies across students. 

In addition to these significant substantive results, there are, however, 
a number of significant gaps in our knowledge. Several of these have come 
out in the previous discussion and require only summarizing here. 

The primary gap in understanding — at both a conceptual and empirical 
level — is an inadequate picture of the relationship between school quality and 
subsequent performance and therefore of how to measure school quality. 
The previous findings on the operations of the school system relate almost 
exclusively to test-score measures of achievement, even though validity of 
this measure is quite uncertain. It is difficult to overemphasize the importance 
of pursuing this line of inquiry. 4 * 

Beyond this, there are also a series of uncertainties which are amenable 
to research. For example, the influence of peer compositions — which is 
central to such important questions as integration and ability-tracking 
policies — remains murky. 47 Also, the dynamics of the educational process 



tive, but clearly less productive at the margin than its cost based upon typical salary 
schedules. The others do not appear to show any positive relationships* let alone relation- 
ships strong enough to justify their costs. 

44 Rclatcdly, little information about the decision process which dictates organizational and 
process determination is available. While not essential for understanding the production 
process per se, this might be very relevant for developing appropriate strategies to institute 
change. Available work on introduction and implementation ot "innovations" (really 
process changes) includes Berman and McLaughlin [15] and Silkman [127]. 

45 Previous work indicates that stratification by race or SES or inclusion of interaction terms 
among school resources and backgrounds consistently shows significant achievement 
relationship differences: for example, Hanushck [61] or Summers and Wolfe [134]. 

46 While data availability is a clear problem, more can be done along these lines even with 
existing data. Considerable individual data with information about qualitative aspects of 
schooling are becoming available, but have not been fully exploited for linkages with 
education an alyses. For example , Sewell and Hauser [ 126] have data about specific schools 
attended and lifetime outcomes, as does part of the Survey of Income and Education. The 
Michigan Panel Study of Income Dynamics and the National Longitudinal Surveys have a 
variety of test measures along with outcome data. Each could potentially provide more 
information about school quality and subsequent outcomes. 

47 In many studies, data limitations (either because of aggregation problems or incomplete 
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are imperfectly understood. Most studies have been cross-sectional and rely 
upon specialized samples with little information about the impacts of 
resources for different age-school year cohorts. Therefore, given other 
incommensurates, it has been very difficult to analyze such issues as the 
differential impact of early school programs or varying achievement 
patterns.** As a final example, there has been little investigation of the 
stability of teacher skills, or individual teacher effects, over time. 

Many of the analytical problems are ones where the conceptual 
problems are minimal, but data problems are severe. A9 In particular, the 
current methodology is probably quite adequate for many further analyses 
of elementary education where, among other things, there are fewer 
measurement problems and generally simpler organizational structures. 
There simply seem to be large, immediate payoffs to collecting new data for 
a variety of school situations where the most significant measurement 
questions indicated above are avoided and where longitudinal information 
can be obtained. 50 

Another set of problems — ones where conceptual inadequacies appear 
paramount — is nevertheless very significant. They have occurred in the 
previous discussion and include measures of alternative outputs, investiga- 
tion of decision processes with regard to alternative output mixes, identifica- 
tion and measurement of process and organizational variables for both 
schools and classrooms, and expansion of our models for more complicated 
realities such as high schools. 

V. CONCLUDING REMARKS 

While the primary motivation for analyzing educational production relation- 
ships is derived from the sector's important resource usage, from its 

information) preclude analysis of peer influences. The studies directly considering peer 
mOuenccs (Hanushek [61] and Henderson and others [701) yield conflicting results. The 
research into racial composition has generally been flawed by limited historical information 

48 Several studies have included measures of preschool programs (e.g., Hanushek [61], 
Ritzen and Winkler [120]). These generally show more impact than is found in direct 
analyses such as that of Headstart (Cicirelli and others [35]). 

49 One clear problem has been the use of specialized samples which relate to limited situa- 
tions and which (because generally collected for other purposes) are not always 
appropriate. Historically, however, there seems to be a bias against data collection. 
Const^riole money was spent on reanalyzing the data from the Coleman Report, even 
though it was welt recognized that there were many, near-fatal flaws in those data, while 
none was spent on simply gathering better data 

50 A panel study design could, in addition to allowing dynamic analyses, be used to minimize 
remaining problems such as missing data on individual abilities; see Boardman and 
Murnane [19] or Hanushek and Jackson [65]. 
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importance as a policy instrument, and from concerns about efficiency given 
the lack of market incentives, such analyses also have direct linkages to 
other research and policies. For example , the entire focus of school finance 
system discussions has been the distribution of expenditures per student 
under alternative financing mechanisms, even though the apparent levels of 
economic inefficiency in education indicate that this has little to do with the 
distribution of educational services. Similarly, many studies (found in public 
finance or urban economics) control for the quality of governmental services 
by including school expenditures per student with little consideration of 
what this is actually measuring. While part of school integration discussions 
relate to the effect of racial composition on achievement, ambiguities in this 
work have led to considerable confusion in this area (see Clune [36]). As a 
final example, even though years of schooling is a clearly inadequate 
measure of individual skill and ability differences, most contemporary labor 
economics and sociological research deals at only this level. 

While other examples are easy to compile, the message should be clear: 
Understanding the educational sector has important ramifications for 
understanding many other areas, but the treatment of education has, for the 
most part, been quite superficial. 
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Multiple Choice Questions in 
Elementary Economics 



The mark of maturity in a science is rigorous hy- 
pothesis-testing. Economic education is a new field in the sense that 
only a little hypothesis-testing has been done so far. This is why 
it is an exciting field — it is full of opportunities. 

Hypothesis-testing requires measurement, quantification, and 
data. One reason there has been little hypothesis-testing in eco- 
nomic education has been 'the lack, until recently, of objective 
measuring instruments. 

Important as rigorous hypothesis-testing is for the advancement 
of economic education, hardheaded evaluation by each teacher 
of his own accomplishments could also contribute a great deal. 
College teachers of economics, like the government, have commonly 
measured their accomplishments by their input instead of their 
output. In GNP data, the Department of Commerce estimates the 
output of the government sector by the value of the inputs rather 
than by the value of the output because there has been no alter- 
native. College economics teachers have judged themselves and 
their fellows by the amount of sophisticated economics they have 
put into their courses, not by how much the students have gotten 

♦I am indebted to Paul L. Dressel, George P. Hollenbcck, and Allen C. 
Kelley for helpful comments on the first draft of this paper. None of them 
haft seen the final draft, and no one ifl to blame for the shortcomings of the 
paper except mysrlf. 
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out of them, because there has been no alternative. College teachers 
who deplore casual empiricism in research have relied on nothing 
else in evaluating teaching. 

In the last few years, this situation has changed rapidly. There 
are now available at least three good multiple choice tests useful 
for compiling data, testing hypotheses, and evaluating teaching. 1 
Since most economists know little about the principles of good 
test construction, the first section of this paper will summarize the 
main points. To illustrate thein, the second section discusses some 
multiple choice questions written by the participants in the Stan- 
ford Seminar on New Developments in the Teaching of Econom- 
ics. The third section will present some data derived from the Test 
of Understanding in College Economics. 

I 

The most important principles governing construction of objective 
tests — particularly tests to be published — are these: 

1. The fir.< step is to draw up specifications. Normally, specifi- 
cations for a test consist of a two-way classification giving the 
percentages of questions to be asked. One classification is by sub- 
ject matter. For example, in the College Board's test in introduc- 
tory economics, 40 percent of the questions are on macroeconomics, 
40 percent on microeconomics, 19 percent on international trade, 
and 10 percent on comparative economic systems. 2 The other classi- 
fication specifies the typj of question or type of material. For ex- 
ample, the specifications for the College Board's test state that 
15 percent to 20 percent of the questions test "Ability to apply 
simple models and use analytic tools." 3 

*The Test of Economic Understanding (for high school students), pub- 
lished by Science Research Associates for the Joint Council on Economic 
Understanding, 1964; the Subject Examination in Introductory Economics 
of the College Entrance Examination Board's College-Level Examination 
Program (for the end of a full-year college course); and the Test of Under- 
standing in College Economics (described below). 

'As two of these categories arc very broad, they presumably were broken 
down into such subcategories as money and banking, national income anal- 
ysis, and so on, in the actual work of building the test. 

•Study of the questions actually used in the test has shown that "apply" 
refers to questions with hypothetical rather than real data, to artificial rather 
than to genuine situations. The College Board's test contains almost no ques- 
tions of the kind discussed in Section II and the Appendix to this paper. 
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2. Multiple choice questions are better than true-false questions. 
Except for unduly easy questions, it is much more difficult to write 
true-false questions for which a group of experts will agree on what 
the right answer really is. Moreover, true-false questions encourage 
guessing. Because there are only two choices, the options do not 
present a continuous line of thought and do not force the student 
into clear-cut discriminations. True-false questions therefore reveal 
less about what goes on in the minds of students. The incorrect 
answers to multiple choice questions can be written to reveal the 
nature of the incorrect thinking. (Multiple choice questions nor- 
mally have cither four or five options.) My experience indicates 
that finding a fifth option that is both plausible and wrong is more 
trouble than it is worth. 

3. Questions must be tried out on a large number of students to 
weed out those that do not work. "Large number" means enough 
to give reliable results— preferably several hundred. Data is needed 
to show that (a) the proportion of students getting the right answer 
is in the appropriate range (questions that all get right or all get 
wrong are not useful for discriminating among students) ; (b) there 
is positive correlation— preferably a coefficient of 0.30 or better- 
between right answers to a given question and an index of the 
quality of the students; and (c) all of the wrong answers are 
selected by some students ("distractors," as wrong responses are 
called, that are never chosen make the test inefficient in the sense 
that testing time is used in a way that contributes no data useful 
for discriminating among students) . 

4. All questions must be carefully edited by both an economist 
and a psyehomctrieian. Editing is difficult, demanding, and time- 
consuming. The economist must make sure that the right answer 
really is right and that every one of the wrong answers is indisputa- 
bly wrong. 4 The psychometrieian must make sure that the question 
conforms to high standards of test construction from a technical 
point of view. (For example, testing experts look for "giveaways" 
—language that enables a test-wise student to detect the right 
choice without knowledge of the subject.) 

4 If the student is to be presented with two correct options, a key must be 
used to prevent ambiguity. (Psychometricians frown on the device of making 
the last option "all of the above" and similar expedients since the preceding 
options are not unambiguously wrong.) The two correct choices can be 
labeled I and II in the key and the options made to read: 1. I only. I. U 
only. 3. Both I and II. 4. Neither I nor II. 
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5. If the user wants to know more than just the relative ranking 
of a group of students, the test in final form needs to be adminis- 
tered to hundreds or thousands of students representing a good 
cross-section of the population we are interested in to provide 
norming data. 

6. A good multiple choice test in economics can measure a sub- 
stantial part of student attainment but not all of it. In this respect, 
economics is somewhere between mathematics and history. The 
scores on good multiple choice tests in mathematics correlate so 
highly with grades on other forms of examination that for all prac- 
tical purposes the correlation can be regarded as l.O. 5 In economics, 
the correlation coefficients may be expected to range between 0.60 
and 0.75. In a subject like history, the correlation is lower. To do 
justice to individual students of economics, course grades should not 
be based solely on objective tests. Essay tests are needed also. 
But multiple choice tests can be good enough to provide useful data 
for hypothesis-testing and evaluation of teaching practices, pro- 
vided the data is interpreted with the same awareness of its limita- 
tions as is required in using the very imperfect data on gross na- 
tional product and price levels that economists use in their daily 
work. 

7. Writing good multiple choice questions is difficult. It requires 
an aptitude that can be developed with practice but is not uniformly 
distributed among economists. The advantages of specialization 
and division of labor apply to writing multiple choice questions — 
more so than to essay tests. Essay questions are easy to write but 
hard to grade. Objective questions a: easy to grade but hard to 
write. 

8. Routine questions calling fc r "book" answers and artificial 
questions using hypothetical data are easier to write than questions 
requiring application of economic principles to new, realistic situa- 
tions, including applications to policy questions. 

To the above list of well-established generalizations on construc- 
tion of multiple choice tests, let me add two helpful hints. One is 
that mu^iple choice questions can be extremely useful as a teaching 
device. An example is their use at the University of Wisconsin, as 
reported by Allen Kelley, for weekly diagnosis of student progress 

*Of course, the computed correlations are lower, partly because of sampling 
errors. 
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and tailoring assignments of individual students to test results. 6 In 
addition, questions on which the class as a whole had mixed or poor 
results can be used in lectures and class discussion to clarify points 
that have not gotten across. 

The other helpful hint is that a multiple choice test in which 
the student is required to give a brief explanation of his answer can 
be a useful compromise that avoids the most serious drawbacks of 
essay tests and ordinary multiple choice tests. Most teachers are 
not in a position to construct good multiple choice tests themselves, 
since they are unable to try the questions out on hundreds of stu- 
dents before using them or to hire a psychometrician to help edit 
them. On the other hand, essay questions are subject to wide vari- 
ability in grading. Essay questions should be graded independently 
by at least three economists, and the grades averaged, a procedure 
that is usually impractical. But if the student is asked a multiple 
choice question and must explain his answer briefly, any deficiencies 
in the question itself can be detected and allowed for, and the 
teacher gets quasi-objective data on how much or how little the 
students have learned. 

II 

In June 1968, the professors participating in the Seminar in New 
Developments in the Teaching of Economics took the Test of Un- 
derstanding in College Economics. There followed a lengthy discus- 
sion of the results, which are reported in the next section of this pa- 
per. Suffice it to say for the moment that the extraordinary skill 
economists generally have for detecting flaws, real or imagined, in 
other people's multiple choice questions was exhibited to the full on 
this occasion. The participants were then challenged to try to do 
better themselves. They were given four genuine quotations and 
asked to write one or more multiple choice questions on them or on 
a genuine quotation or realistic situation of their own choosing. A 
modest prize of $25 was offered for the best question. 

Twenty-three of the forty participants in the seminar submitted 
a total of twenty-six questions. Eighteen of the questions were 

•Allen C. Kelley, "An Experiment with TIPS: A Computer-aided Instruc- 
tional System for Undergraduate Education," American Economic Review, 
May 1968, pp. 446-57. 
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based on the quotations I supplied, the other eight being entirely 
original. 7 Three of the participants submitted a total of four ques- 
tions that seemed to me reasonably good. Some others had possibili- 
ties but needed considerable reworking. The best questions were 
based on two of the quotations I furnished, with two of the wholly 
original questions having possibilities 

In interpreting these results, it must be borne in mind that the 
time available for writing questions was extremely short. The con- 
test was announced late one afternoon. The deadline for submitting 
entries was the next morning. On the other hand, the professors 
were to some extent a select group with a special interest in teach- 
ing. Moreover, the questions were judged by the loose standards ap- 
propriate for items that had not been edited or tried out on stu- 
dents. 

Subject to such qualifications, the results accorded with expecta- 
tions. Questions requiring application of economic principles to 
genuine quotations or situations are hard to write. I doubt if as 
many as 10 percent of all economists have any real aptitude for 
such work. 8 

The prize-winning question was submitted by C. G. AVilhu.ns of 
Boston College. It was based on the following quotation (from an 
economist whose identity I prefer not to publish for reasons that 
will be obvious) : 

Some decades ago clean laundry was produced m the household by means 
of hand laundering. Later, toward the end of the 19th century a* mechani- 
cal washers were de\ eloped, this activity moved into commercial laundries, 
in the busmen sector. Pre^cnth the nio\ement i^ back the other way, from 
the business sector to the household, where hoinelaundering devices arc 
rapidly becoming commonplace The first move was a >imple case of 
mechanization and realization of economies of large scale production The 
act of laundering became economically more efficient because machines 
took o\er from human drudgery. In the second move, however, economic 
efficiency was lost because home washing and drying machines operate on 
a small seale and remain idle during much of their useful life. 

7 Original m the sense that the paitieipant found the quotation or situation 
himself. The terms of the contort prohibited artificial situations or "quota- 
tions" invented for the occasion. 

"When I told a niembei of the Committee for a Collogc-Lc\ cl Test of 
Kconomic Understanding that the results of the content accorded with my 
expectations, he said I was cynical. I disagree To sav that f<w economists 
can write multiple choice quotations of a realistic kind is no more <\m<al 
than to say that most economists are not eeonometncians 
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Williams' question is rf ^odu<ted here without any editing. 

Is this quotation essentially correct or incorrect and why? 

(a) Correct because it means that housewives waste a lot of effort doing 
laundry when such work can be more Easily done by commercial 
laundries. 

(b) Correct because large-scale operations, especially in a business like 
laundering, are always cheaper, per unit of output, than small-scale 
operations. 

(c) Incorrect because, as with home appliances, commercial laundry 
machinery is not worked continuously and similarly is subject to 
depreciation and obsolescence. 

(d) Incorrect because the process has taken place in accordance with 
the preferences of the household regarding the allocation of house- 
hold time, effort, and income. 

The question has the merit of testing the ability of students to 
detect an error in the use of an important concept: economic effi- 
ciency. Its shortcomings are remediable. The quotation would have 
to be shortened, unless other questions were to be based on it, be- 
cause it is much longer than necessary. The question preceding 
the options should refer to the correctness of the last sentence of 
the quotation rather than to the whole of it. «"„ word "allocation" 
in option (d) needs to be replaced by "use since such a favorite 
word of economists may tip off the right answer to test-wise stu- 
dents. Since response (c) could be defended as possibly correct, 
though clearly inferior to (d), it might be desirable to specify that 
the student is to choose the best response. Some revision of lan- 
guage and punctuation would be routinely made in the process of 
editing. 

At the tiine the questions were judged, there was no way to know 
whether the one by Williams would work with students. Sub- 
sequently, I tried it out on twenty-six sc v )lteachers who had just 
been through the equivalent of a course in elementary economics 
at Georgia State University. 9 1 one chose (a), seven chose (b), one 
chose (c), seventeen (two-thirds of the group) chose the right an- 
swer, (d), and one did not answer the question. The results suggest 

•On the Test of Understanding in College Economics, Part I, their mean 
score did not differ significantly from the national average of college students. 
The laundry question was administered in an undesirable way. It was shown 
on a screen rather than duplicated and distributed. Some participants were 
too far from the screen to be able to read it clearly The results may have 
been adversely affected in consequence. 
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that the question is in the right renge of difficulty. Option (b) is an 
effective distractor (and shows how badly oversold Americans are 
on the economies of large-scale production). There is reason to fear 
that options (a) and (c) are ineffective, but a larger sample would 
be needed to arrive at a judgment. 10 

Later, I gave the question to 150 students at Vanderbilt Univer- 
sity at the end of the first week of economic? instruction. The stu- 
dents were required to give a brief explanation of their answers. 
Although a very large majority chose the correct option, all the 
distractors proved effective. The explanations were generally disap- 
pointing. Whether the students chose the right response for the 
wrong reason, or whether they lacked the skill needed to com- 
municate successfully, or whether they were given too little time is 
unknown. The fact that the question was given so early in the 
course undoubtedly accounts in part for the poor explanations. 11 

Some of *he other questions submitted in the competition are dis- 
cussed in the Appendix to this paper. 

ill 

Table 1 compares the scores on all four forms of the Test of Un- 
derstanding in College Economics of the forty participants in the 
Stanford seminar (called ''professors" in the table) with the norm- 
ing data based on a nationwide sample of students. Since the TUCE 
has been discussed at length elsewhere, only a brief description will 
be given here. 12 Construction of the test was proposed by the 
Committee on Economic Ed cation of the American Economic Asso- 
ciation and carried out by the Joint Council on Economic Educa- 
tion through a committee jf economists and psychometricians and 
The Psychological Corpo atior. 13 There are two forms (called A 

10 No data were collected on the relation between right answers to this 
question and the ability of the participants. Verification that the correlation 
coefficient is significantly higher than zero would be needed before the ques- 
tion could be Accepted for use in a multiph choice test. 

"The laundry question was preceded on the quiz by the question "What 
is meant by 'economic efficiency 1 ?" About one-quarter of the answers to that 
question were given a grade of F. 

"Besides the Manual for the TUCE, see Rendigs Fels, "A New Test of 
Understanding in College Economics, 1 11 American Economic 'leview, May 
1967, pp. 660-66; and Arthur L. Welsh and Rendigs Fels, "Performance on 
the New Test of Understanding in College Economics," ibid., May 1969. 

"The members of the committee were G. L. Bach, William G. Bowen, 
Paul L. Dressel (Executive Director), Rendigs Fels (Chairman), R. A. Gor- 
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Multiple Choice Questions in Eltmtnfery Economics 

TABLE 1. Comptri** ©* Scores on Test of Understendins 
In Colltfe Economics (percent correct) 
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98 
9* 
9« 

100 
95 
98 
93 
9« 
95 
55 
63 
50 
65 
9$ 
9$ 
90 
98 
93 
98 
93 
90 
58 
80 
68 
88 
93 
88 
88 
98 
93 
95 
93 



87 
85 
80 

66 
72 
85 
79 
74 
63 
65 
58 
45 
47 
48 
60 
52 
55 
58 
52 
63 
50 
51 
37 
47 
39 
38 
45 
45 
47 
46 
48 
42 
32 



It 
13 
18 
32 
28 
10 
19 
19 
35 
30 
-3 
18 
3 
17 
38 
46 
35 
40 
41 
35 
43 
39 
21 
33 
29 
50 
48 
43 
41 
52 
45 
53 
61 



100 
98 

100 
98 

100 
98 
98 

100 
85 
98 
95 

100 
88 
90 
98 
98 
85 
88 
95 
95 
70 
78 
88 
88 
90 
88 
83 
98 
70 
95 
83 
80 
88 



94 
79 
78 
69 
87 
76 
80 
65 
65 
68 
49 
69 
59 
56 
53 
59 
70 
55 
48 
54 
47 
40 
59 
50 
28 
49 
53 
67 
54 
39 
31 
31 
25 



6 
19 
22 
29 
19 
22 
18 
35 
20 
30 
46 
31 
29 
34 
45 
39 
'.9 
33 
47 
41 
23 
38 
29 
38 
62 
39 
30 
31 
16 
56 
52 
49 
63 



97 

92 

82 

97 

90 

92 
100 

85 

87 

90 

80 
100 

92 

92 

90 

92 

92 

87 

74 

92 

92 

69 

23 

92 

82 

80 

54 

90 

80 

64 

36* 

90 

56 
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61 

95 

82 
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76 

65 
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73 

73 

74 
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58 
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57 

49 

52 

13 

39 
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44 
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32 
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15 
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22 
16 

7 
27 
18 
25 
32 
39 
3a 
39 
13 
35 
43 
17 
10 
53 
22 
35 
18 
45 
34 
30 
17* 
58 
30 
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100 
100 
92 
97 
100 
87 
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97 
77 
85 
100 
77 
87 
87 
87 
100 
82 
87 
77 

ao 

90 
72 
90 



85 
88 

80 
43 
77 
84 
56 
31 
68 
66 
66 
76 
51 
59 
36 
57 
58 
54 
36 
63 
60 
47 
34 
41 
49 
39 
34 
51 
32 
30 
41 
33 
45 



15 
12 
30 
49 
20 
16 
31 
16 
14 
21 
34 
24 
46 
31 
51 
12 
37 
43 
41 
22 
40 
30 
53 
46 
38 
61 
48 
36 
45 
50 
49 
39 
45 



88 56 



31 



91 58 



82* 57* 25* 90 



55 



* Scorci on question 31 of Part II, Form A low because of an trror on the preliminary forms. For students, 
the estimated no.m on the corrected question is 41 percent. 

Som»u Students— Man ual for Test of Understands in Collese Economics (New York: The Psychological 
Corporation, 1968), p. 15/ professors— Stanford Seminar in New Developments In the Teachlna of Eco- 
nomics (see text). The averages (means) shown here for students were computed by averaging the (rounded) 
data In the TUCE Manual (p. 18) and differ slightly from the means given by Ibid., p. 18, which were com- 
puted from unrounded data. 



and B) for the content of each semester of the typical elementary 
college course in economics. (The forms for the first semester, 
called Part I, are mainly on macroeconomics; those for the second 
semester, called Part II, cover microeconomics, comparative eco- 
nomic systems, and international economics.) The specifications de- 
termined by the committee conformed, with respect to subject mat- 



don (1965-67), B. F. Haley (1967-68), Paul A. Samuelson, and George J. 
Stigler. John M. Stalnaker served as consultant. 
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ter content, as closely as possible to current teaching practice; but, 
with respect to the kind of question asked, the committee decided 
on a departure: only one-third of the questions were to test "recog- 
nition and un^iei-standing," with one-third requiring simple applica- 
tions of ecor. >mic knowledge, the other third complex applications. 
In consequence, the TUCE closely resembles the College Board's 
test only in subject-matter coverage. The TUCE has fewer text- 
book-type questions, more requiiing applications to realistic situa- 
tions or genuine quotations. 

Table 1 verifies that the TUCE is a valid instrument in the sense 
that those who know more economics get higher scores. In the na- 
tional sample, students who had received college instruction in 
economics averaged about 19 right out of 33 questions on all four 
forms, whereas the participants in the Stanford seminar scored 29 
on three of the forms and only slightly less on the fourth. It is evi- 
dently a hard test, suggesting that it might be useful at a more ad- 
vanced level than the elementary course. 14 

More interesting than the over-all results are the scores for in- 
dividual questions. In 124 of the 132 questions, the percentage of 
professors getting the right answer was at least 10 points higher 
than for the students:. In five of the other eight cises, the professors 
scored less than 10 percentage points higher largely because the 
question was easy. For example, 95 percent of the students an- 
swered question 4 of Part II, Form A, correctly. The greatest pos- 
sible margin for the professors over the students was 5 percentage 
points; the actual margin was 2 points. The renaming three ques- 
tions (nos. 11 and 13 on Form A of Part I and no. 11 on Form A 
of Part II) will be discussed individually in a moment. On 111 
questions, 80 percent or more of the professors chose the right re- 
sponse. The 13 questions on which they scored less than 70 percent 
(which include two questions for which a low or negative margin of 
improvement over the students cannot be explained on grounds of 
easiness) merit individual attention. 

For one of the 13 questions (namely, II-A-31), a key sentence 
had been omitted from the stem in the preliminary forms used for 

W A mean score for students in the vicinity of 57 percent is too low for 
purposes of giving course grades under the traditional scale of 70 percent 
for a C, 80 percent for a B, and so on, but is just about right for a test 
designed to be used for evaluation and research at a wide variety of colleges, 
including the best. 
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both professors and students. The error has been corrected in the 
final form as published by The Psychological Corporation. 15 

II-A-23 is the only other question on which the professors scored 
less than 50 percent. This question is on marginal productivity 
analysis. Hypothetical data given in the stem imply that the firm 
could increase profits by increasing the use of both factors of pro- 
duction. Apparently professors and students alike are so used to 
thinking in terms of factor substitution in the context of marginal 
productivity that they lost sight of profit maximization. 

In Part I, Form A, question 11 is based on a genuine quotation 
from a popular magazine. The analysis implicit in the quotation 
is faulty, and the quebtion was designed to test the ability of the 
student to see through it. Although the question does not discrimi- 
nate effectively between professors and students, the results are 
interesting. Students tested before studying economics did virtually 
as well (57 percent) as at the end of the first semester (58 per- 
cent). Moreover, this is the only question on which the professors 
did worse (55 percent) than the students. There are two possible 
explanations. The question may be faulty, or economics training 
may contribute little to reading popular magazines critically, or — 
more likely — both. One of the members of the Stanford Seminar 
argued that one of the wrong responses could be defended as cor- 
rect. Since only 10 percent of the professors (and a similar pro- 
portion of the students both at the beginning and end of the first 
semester of economics) chose that response, the shortcomings of 
the question do not seem to be the major part of the explanation. 
Although evidence from one question means little, the results sug- 
gest what I believe to be the case: that teachers of elementary 
economics rarely make any effort to train their students to use 
their knowledge of economics in reading newspapers and maga- 
zines. Consequently, neither students nor teachers acquire skill 
along these lines, and the popular press can get away with publish- 
ing bad economics. 

15 1 am indebted to the members of the Stanford Seminar for pointing out 
the error in time to make the correction. It was possible to estimate the 
national norm for students from tryout data. The estimate is 41 percent, 
rather than the 19 percent shown in Table 1. The data in Table 1 for this 
question are based on what would have been the right response if there 
had been no error in the stem. On the basis of what I consider to be the 
correct answer to the question as actually asked, 51 percent of the professors 
and 46 percent of the students got it right. Such figures mean little except 
to indicate that even with the right key the question as asked was faulty. 
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Questions 12, 13, and 14 of Part I, Form A, are all based on a 
long quotation from a newspaper story. In each case, the pro- 
fessor's score was low and, in one case (no. 13), it was only slightly 
higher than the students' score. In each case, the students did 
better after a course in economics than before, and the correlation 
coefficients between this item and scores on the test as a whole 
were acceptably high. Thus the questions contribute toward dis- 
criminating between those who know less and those who know more 
—but not as much as one could wish, illustrating the difficulty of 
writing this kind of question. 

Question 23 of Part I, Form A, is a straightforward item asking 
which of four limitations on Federal Reserve policy is most serious 
for combatting recession. Students do markedly better on this 
question after taking economics than before. The professors scored 
markedly higher than the students, but not very high— 58 percent. 
According to the key, the right answer is lack of control over the 
velocity of money. The most popular incorrect response among 
both professors and students was inability to control the level of 
demand deposits. The inference seems to be that although some 
professors teach and use the velocity concept, quite a few do not. 16 

Question 25 of Part I, Form A, gives a short genuine quotation 
from a news source attributing a large rise in the money supply 
"primarily" to federal deficits. It is followed by some facts showing 
that (1) the federal deficit in the period in question was only 
about a third as large as the excess of business spending over sales 
and (2) Federal Reserve credit rose sharply. The question asks 
if the facte "support" what the quotation says about the rise in the 
mo ey supply and why. The exact word used is crucial to the 
validity of the question. The facts do not "support" the conclusion, 
since they suggest that the major source of the increased quantity 
of money was bank credit to businesses. Of course, the facts do not 
necessarily contradict the quotation, since the federal deficit might 
have been the primary stimulus to increased demand for and supply 
of credit. That only two-thirds of the professors got the question 

* Despite some argument at the Stanford seminar, I do not believe there 
can be anv doubt about which of the two limitations discussed in the text 
above is the more serious. Among students, inability to control the outflow 
of gold was almost as popular a response as inability to control the level of 
demand deposits and was chosen more frequently after a semester of eco- 
nomics than before— an illustration of how a wrong answer can reveal in- 
creased understanding. 
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right indicates that the question is hard, requiring careful reading 
and analysis. It is further evidence that teachers make little effort 
to train students in evaluating popular statements about eco- 
nomics. But this time, as one would expect, the professors did 
better than the students. The question effectively discriminates 
degrees of knowledge of economics — 25 percent right for students 
before taking college economics, 39 percent after one semester, 68 
percent for professors, and a correlation coefficient for students be- 
tween this question and total score on the test well above the ac- 
ceptable minimum. 

The scores on no. 11 of Part II, Form A, make it appear easier 
than the average TUCE question. Nearly three-quarters of the 
students and four-fifths of the professors got it right. Yet the pro- 
fessors scored only 7 percentage points higher than the students 
out of a possible 27 points. The question asks about the conditions 
under which a rise in wages will cause a substantial fall in employ- 
ment. The three distractors all name conditions that tend to limit 
the effect on employment. The fourth is of the "none-of-the-above" 
type. Since elimination of each of the three distractors requires a 
fair amount of analysis, the professors' score is not surprising in 
view of the time limit. The puzzle is how the students managed 
to do so well. One suspects that a good many baffled undergraduates 
resorted to "none-of-the-above" at a guess. 

In Part II, Form A, question 22 reproduces from a newspaper 
column an estimate of the cost of operating a used car for a year, 
broken down into major categories. There are two errors in the 
method of calculating costs, doublecounting (both depreciation and 
the purchase price of the used car are included) and omission of 
an opportunity cost (interest on the owner's capital). All the pro- 
fessors spotted at least one of the errors, but only 70 percent saw 
them both, compared to 52 percent for the students. 

Question 27 of Part II, Form A, requires a difficult application 
of economic theory to a realistic situation. Hard though the ques- 
tion is, it discriminates effectively, the professors scoring 53 ^ercent 
compared to 36 percent for the students. 17 In this kind of question, 
it is easy for well-qualified people to get the direction of change 
wrong, especially under pressure of time. Question 30 of Part II, 

w Unfortunatel}', for Part II we do not yet have data on how well students 
perform at the beginning of the semester, but the correlation between II-A-27 
and the total score on the same form is acceptably high. 
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Form A, is (like II-A-23) a numerical example, this one on mar- 
ginal revenue. The professors scored 64 percent compared to 34 per- 
cent for students. Question 33 on the same form (professors: 56 
percent; students: 26 percent) is a difficult abstract question on 
the effect of a deficit on current account in the balance of pay- 
ments on interest rates and prices. Most likely, the time limit af- 
fected the scores of professors more than the scores of students on 
all such questions. Professors have a good enough grasp of eco- 
nomic analysis to be able to make effective use of more time to 
think the problem through and correct initial errors. 

Question 16 of Part II, Form B, is one of a series based on a 
long (genuine) quotation from a newspaper. Only 66 percent of the 
professors got it right compared to 57 percent for the students. For 
the students, the correlation between this item and total scores on 
the form as a whole is marginal but acceptable. In this case, the 
professors' superior knowledge together with the time limit inter- 
fered with their scoring better relative to students. The question 
has to be carefully stated to cover all the contingencies a trained 
economist is apt to think of, but there was not time for the pro- 
fessors to consider them all and see that the loopholes had been 
plugged. As a result, 18 percent of the professors (24 percent of the 
students) chose the response stating that the question could not be 
answered without more information. 

What conclusions can be drawn from this discussion of indi- 
vidual questions? My first reaction was to fear that the low scores 
by professors on certain items indicated bad questions. But this 
turns out not to be the case. Nearly all the questions contribute 
toward discriminating degrees of knowledge. None detracts from it. 
Some— notably I-A-ll and II-A-23— contribute very little and for 
the sake of efficiency are prime candidates for replacement if, as 
I hope, a revised edition of the TUCE is brought out some day. 
But even I-A-ll and II-A-23 tell us something of interest. 

Rather than reflecting on the quality of the test, the comparison 
between professors and students in part confirms what we have 
always known: that economics is difficult. The fact that questions 
written for students in the elementary course sometimes are hard 
even for their professors is evidence tending to confirm the hy- 
pothesis that the elementary course typically is overloaded. We try 
to do too much ; we expect too much of our students. 

On the question of how we train college students to be intelligent 
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citizens, voters, and molders of opinion on economic policy, the 
evidence from the TUCE is at best no more than suggestive. 18 But, 
as indicated above, a number of questions on the TUCE hint that 
we neither try nor succeed in training students to use economic 
principles to evaluate critically what they read in the popular press. 



Two of the better questions written by members of the Stanford Semi- 
nar in New Developments in the Teaching of Economics (see Section II 
above) were based on the following quotation: 

Research activity engaged an 18 percent larger number of economists 
in 1966 than in 1964, as compared with an increase of only about 7 per- 
cent in the number of economists engaged in all activities. In spite of the 
influx of new personnel, the median salary of those engaged in research 
rose slightly more than average for economists as a whole. (N. Arnold 
Tolles, "The Economic Status of American Economists, 1966: A Prelimi- 
nary Report" American Economic Review, December 1967, p. 1816.) 

W. C. Bonifield of Wabash College based the following question on it: 

"The seeming paradox of a relative increase in research economists con- 
current with a rising salary paid to research economists relative to all 
economists is best explained by the fact that 

1. research economists produce an easily measured output while other 
economists do not. 

2. much income paid to non-research economist? is in the form of 
fringe benefits. 

*3. the demand for research economists is also rising relative to the de- 
mand for all economists. 
4. research economists constitute only a small proportion of all econo- 
mists." 

"The multiple choice format, in my opinion, cannot adequately measure 
the ability of a student to combine economic analysis, value judgments, and 
careful consideration of tradeoffs in making up his mind on a policy ques- 
tion. Even for the more limited task of measuring ability to detect good 
economics from bad in newspapers and magazines, low scores on a group of 
multiple choice quest ions may mean merely that the examiner chose to make 
the questions hard. It also could reflect a high degree of sophistication in 
the newspapers and magazines themselves— they may make only sophis- 
ticated errors. But that is too good to be true. 
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C. G. Williams of Boston College based the following question on the 
same quotation: 

"This is evidence that 

a) in the market for economists an increase in the supply of research 
economists (an outward shift of their supply curve) tends to re- 
duce the relative median salary of economists in non-research ac- 
tivities. 

*b) over the period as a whole there was an excess demand for research 
economists that was met partly by an increase in their number 
and partly by an increase in their salaries relative to economists 
engaged in all activities. 

c) the demand for research economists tended to be positively sloping 
because despite the fact that their salaries rose relative to econo- 
mists engaged in all activities, relatively more economists were able 
to find employment doing research work. 

d) among economists now entering the profession, research work carries 
prestige and status which means that the more able, the better 
trained, and hence the higher paid economists, actually prefer 
research positions." 

The questions asked and the correct answers are essentially the same in 
the two items, but the distractors are different. Boni field's version has the 
merit of conciseness, but Williams' could be cut during editing. Although 
only a tryout on students could determine whether the questions would 
work in practice, they look promising. They test the ability of a student 
to use supply-and-demand analysis to solve a problem. 10 

One of the quotations given to the students as a basis for writing ques- 
tions was fron a syndicated column, "A;>k Andy" (the quotation appeared, 
among other places, in the Nashville Tennessean, April 16, 1968, p. 16): 
"Every dollar bill must have value — and its worth is backed by the go\ - 
eminent, so the number of bills printed must be limited by the wealth of 
the country in gold and money, in lots of other assets and resources." It is 
interesting that nobody succeeded in writing a good question on this bit of 
nonsense. 

The best of the questions based on a quotation or situation other than 
those furnished to the participants was submitted by William P. Kinney of 
Foothili College: 

19 Bonifield's question was the runnerup in the contest dcscnbcd in Section 
II of this paper. My decision in fa\or of Williams' question on commercial 
vs. household laundering was based primarily on the belief that it is more 
important to test whether students understand what economic efficiency is 
all about than to test their skill in supply-and-demand analysis, impoitant 
as the latter is. Economists who arc more interested in equipping students 
with the tools of their trade may disagree with me. The fact (hat Williams 
submitted a second question with merit, though irrelevant, strut Iv speaking, 
to the awarding of the prize, reinforces my conviction that jiisthe was done 
Whether or not the decision of the judge is final. 
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"But the Vietnam war has had its impact on health programs for our 
citizens. Medicaid expenditures— which provide assistance to the medically 
needy of whatever age— were restricted last year because of the budgetary 
squeeze." (From "First Things First" by Senator Eugene J. McCarthy.) 

The above quotation best illustrates: 

A. The inefficacy of fiscal policy in the Vietnam war years. 

B. The concept of opportunity cost. 

C. The clear need for a more expansionary monetary and fiscal policy. 

D. The need for greater "fiscal responsibility" on the part of the govern- 
ment. 

A psychometrician would probably criticize the question for lack of 
parallelism between the right answer and the distractors. The right answer 
is a concept, whereas the distractors are characterizations of policies. From 
the point of view of substance, the question is in part a test of vocabulary. 
It is more important for a student to be able to use a concept like op- 
portunity cost than to know the name economists have given it. But the 
question tests ability to recognize the idea in an unfamiliar context, and 
one can argue that the concept is so important that students should know 
it? name. 
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The Test of Understanding 

in College Economics 
and Its Construct Validity 



Darrell R. Lewis and Tor Dahl 



hat docs the new Test of Understanding in College Economics (TUCE) actually 



measure, and of what significance is this to those interested in evaluating stu- 



y 7 d ent performance in the introductory college economics course? Although some 
results obtained during the norming [11,6,17] and early experimental use of TUCE 
[1,9,7} have been reported, a number of construct validity and research design questions 
remain. The purpose of this paper is to present additional data on the TUCE, primarily 
with regard to its validity as an experimental testing instrument and as to its construct 
design. 

In brief, the results from this study indicate that: {/) TUCE is effective in 
discriminating between "good" and "poor" students in economics. (2) Although 
academic ability and critical thinking skills are related to achievement in economics and 
on TUCE, considerable independence between TUCE and prior ability exists. TUCE 
incorporates prior ability and critical thinking skills while also effectively discriminating 
on other knowledge. (J) The subparts of TUCE can be differentiated in that they do seem 
to measure different things. (4) However, when associating the three subparts of TUCE 
with critical thinking skills as measured by the Watson-Glaser Critical Thinking 
Appraisal (CTA), the most significant associations occur with the "simple application 9 ' 
and not the "complex application" types of questions. In fact, the complex application 
questionsNn TUCE show minimal association with critical thinking skills. Our results 
indicate that the researcher using TUCE must be cautious about imputing higher 
educational value to complex application types of questions. 



Darrell R. Lewis and Tor Dahl are Associate Professor of Economic Education and Lecturer 
of Economics in Public Health, respectively, at the University of Minnesota. 
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Source of Data and the Test Instruments 

Primary data for this paper were obtained from an experimental research study 
dealing with critical thinking skills in the introductory couise undertaken at the 
University ol Minnesota in 1969. ! For the purpose of this paper, it is onl> important to 
know that during the 1969 fall quarter, 784 University of Minnesota students in 23 
sections of Economics I (Principles of Economics— Macroeconomics) were subjected to 
before and after survey questionnaires as well as being pre- and posllested on the TUCK 
(Part I, Forms A and B) (I I] and the Wiuson-Glaser Critical Thinking Appraisal (CTA) 
(FormsZM and YM)(I6). 

The CTA, which has been nationally normed and validated, has frequently been used 
as a measure of critical thinking achievement in instructional situations at the secondary 
or college level and in industrial and executive programs (4,13,18,8] It has also hac 5 
extensive usage as a research tool to determine the relationship between critical thinking 
abilities and other abilities or traits (16,14] 

The CTA instrument consists of a series of lest exercises requiring the application of 
some of the more important abilities involved in critical thinking The exercises include 
problems, statements, arguments and data interpretations similar to those which an 
enlightened citizen might encounter in his daily life as he might work, read newspapers or 
magazine articles, hear speeches, or participate in discussions of various issues Of three 
such national instruments, the one selected for this study most closely approximates the 
criteria for critical thinking skills discussed by Paul Dressell (2), Morse and McCune 
(10], and the American Council on Education (3). The lest, available in lv\o forms (YM 
and ZM). consists of five subtests designed to measure different, though interdependent, 
aspects ol critical thinking : Thus each form contains 100 items that can be completed in 
about 40 min es. In some items the student is asked to think critically about problems 
involving "neutral" topics such as the weaiher, scientific facts or experiments, and other 
mailers about which people generally do not have strong feelings or prejudices Other 
items, approximately parallel in logical structure, pertain to social issues concerning 
which many people have emotional feelings, biases or prejudices 

TheTUCE has been discussed in detail elsewhere (17,15) Each of its four forms has 
three subparts of 1 1 questions involving (a) "recognition and understanding," (b) "simple 
application," and (c) "complex application." The primary purpose of the lest is to 
provide a research instrument for controlled experiments, and thereby a basis for 
evaluating different versions of the introductory college economics course [17, p 224, and 
15, p. 621. Consequently, it is important that we have added data on TUCE as a 
measurement instrument. 



'The research design and results of thai study are reported in another paper and may be obtained 
by writing to the authors. Darrell R Lewis and Tor Dahl, Critical Thinking Skills in the Principles 
Course: An Experiment. Unpublished paper, University of Minnesota, 1970. 

-The five subtests are as follows* 
Test I. Inference (20 items) Samples ability to discriminate among degrees of truth of inferences 
drawn from data. 

Test 2. Recognition of assumptions (16 items). Samples ability to recognize unstated assumptions 
which are taken for granted in given statements. 

Test 3. Deduction (25 items). Samples ability to reason deductively from statements, to recognize 
the relation of implication between propositions, to determine whether what may seem to be an 
implication from premises is indeed such. 

Test 4. Interpretation (24 items). Samples ability to weigh evidence and to distinguish between 
(a) generalizations from data that are not warranted beyond a reasonable doubt, and (b) gen- 
eralizations which, although not absolutely certain, do seem to be warranted. 

Test 5. Evaluation of arguments (15 items). Samples ability to distinguish between arguments 
which arc strong and relevant and those which are weak or irrelevant to a particular question 
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TUCE is ii Discriminating Measure 

As Rendigs Fcls points out [6, pp. 4-5], we have preliminary evidence that TUCE, as 
a discriminating measure of performance in the principles of economics, is a valid 
instrument. "Before studying economics, college students in a national sample got 13 
questions right out of 33 (40%). After studying economics, they got 19 right (58%). A 
sample of professors got 29 right (88%). Thus, those who presumably know more 
economics score higher than those who know less. Furthermore, there is evidence that 
every single question works in the sense that good student* are more likely to get it right 
than poor students." 3 These results were essentially confirmed in the Minnesota data. 
Table I (column 5) indicates that the Minnesota students increased their scores on TUCE 
from a mean of 14.16 to 19.85, an increase of over 40 percent. This value-added of 40 
percent approximates the improvement factor of 41 percent found in the national 
averages of the norming data [I 7 , p. 225]. 



Table 1 

TUCE Performance by Selected Groups t 





(1) 
Honors 

X 


(2) 
Top 27% 
(/V=210) 

X 


(3) 
Middle 

(#=359) 

X 


(4) 

Bottom 27% 

(tf=*215) 

X 


(5) 

Total (2+3+4) 
(/V*784)ttt 
X 


Pre-TUCE 


19.15 


17.59 


14.06 


11.53 


14.16 


Post-TUCE 


25.93 


25.46 


19.97 


14.96 


19.35 


increase 


6.79 


7.87 


5.91 


3.43 


5.69 


Percentage increase 


35% 


45% 


42% 


30% 


40% 



tDiffcrcnccs between pre- and post-TUCE in all five groups were statistically significant at the 
.001 level. 

tfThe 21 students from the two honors sections are not included in either the top 27 percent nor 
in any of the other Minnesota data reported in this paper. 

tttMean scores for TUCE in Table I differ slightly from mean scores in the other tables, in the 
other tables the number of students included is slightly smaller as a result of incomplete ACT 
scores on transfer students and because some students failed to complete all of the question 
naires. 

Table I also indicates that TUCE effectively discriminates performance of varying 
ability levels. In fact, in both absolute and percentage terms, TUCE discriminates more 
effectively for the more capable regular students; nor does there appear to be a "ceiling 
effect" in the use of TUCE with the principles course. Even for students from two honors 
sections who were not included in the regular total (Table I, Column I), TUCE 
effectively discriminated a value-added of 6.79 points, but since the honors sections 
produced a pre-TUCE average score of 19.15, this high base resulted in a lower 
percentage gain than was achieved by the upper 27 percent. Nonetheless, the honors 
sections still produced the highest average post-TUCE score. 

Furthermore, the data indicate that those who scored highest on the pretest were 
approximately the same ones who scored highest on the posttest. The same held true for 
the lowest group; low pretest scores were also low scorers on the posttest. Of the lowest 
27 percent of students on the pretest (215 students), we found 53 percent, or 1 15 students, 
still in the lowest 27 percent of scorers on the posttest. Similarly, of the top 27 percent of 
students on the pretest (210 students), we found 58 percent, or 12 1 students, still in the top 



'Interestingly, our 13 graduate student instructors for Economics I also took TUCE a 
suits of an average 29 correct approximate Fels* sample of professors [7). 



27 percent on the posttest. 4 

These results elaborate on those from the national norms as reported by Welsh and 
Fcls [17, p. 225]. Althougn limitations of the national norming data precluded 
comparisons of good and poor students on a before and after basis, these types of 
comparison can be madw on the strength of data we have processed for our 784 students. 

TUCE and Content Validity 

As pointed out by the American Economic Association's Test Committee 
responsible for the construction of TUCE, . . to the extent that TUCE measures what 
the Committee judged important, the Test has content validity, and is a valid tc [I I, p. 
15]. Content validity of TUCE, as defined above, cannot be determined statistically. 
However, an important measure for each experimenter using TUCE could be the degree 
to which the students' course grades or other relevant examination scores correlate with 
post-TUCE performance. As Table 2 indicates, the Minnesota data had a post-TUCE 
zero-order correlation coefficient of .58 with the student's final grade. It is important to 
note that the instructors did not teach in order to improve performance on the 
instrument, nor did they use the post-TUCE scores in computing the final grade. In fact, 
six of the 13 graduate student instructors expressed opinions concerning the 
"inappropriate" coverage of TUCE relative to "their" principles course. In the light of 
these qualifications, it is surprising that the correlation coefficient is as high as it is. 

Another method of examining what TUCE measures is to study correlations 
between performance on TUCE and performance on other tests. Although similar data 
have been developed by Saunders for College Entrance Examination Board Verbal and 
Mathematical f est scores [II, p. 18], none have been presented for student scores on 
either the American College Test (ACT) or the CTA. 

In our study students took TUCE (Part 1, Form A) as a pretest and TUCE (Part 1, 
Form 3) as a posttest, and the TUCE scores ^ere correlated with both ACT and CTA 
test scores. Some results are presented in Table 2. By its very nature, TUCE as a measure 
of achievement might be expected to show a relatively high relationship to such measures 
of ability as ACT and CTA. However, in the light of such expected relationships the 
correlations in Table 2 are moderately' low. This substantiates the results found by 
Saunders in which the highest < allege Board-TUCE zero-order correlation was .44. It 
further indicates that although academic ability (as measured by ACT) and critical 
thinking skills are related to achievement in economics and on TUCE, considerable 
independence between TUCE and these other abilities also exists. TUCE both 
incorporates prior ability and critical thinking while also effectively discriminating on 
other kinds of knowledge. 5 



4 ln separate multiple linear regressions not reported in the text or tables we found confirmation of 
our suggestion thet the amount of economic knowledge (as measured by TUCE) and that the stu- 
dent brings into the course makes a difference in postcourse performance. When pre-TUCE test 
scores were included as an independent variable in a multiple regression which included 
post-TUCE as the dependent variable, the pre-TUCE test scores were highly significant (at greater 
than a .01 level). In fact, the resulting regression coefficient of .30 indicated that approximately 
I point was contributed to the student's post-TUCE score for every 3 correct scores o^ pre-TUCE. 

J ln separate multiple linear regressions not reported in the text or tables, we found confirmation 
lor these zero-order correlations and inferences. When pre-CTA test score* were dropped from a 
multiple regression which included post-TUCE as the dependent variable and pre-TUCE as an 
independent variable, the pre-TUCE test scores become much more significant with its regression 
coefficient more than doubling. Similarly, the R 2 dropped by over .10. To some extent, CTA and 
TUCE are surrogates for one another— i.e., they both measure similar types of "thinking proc- 
esses** or "skills.** However (and as inferred above) they still give individually large residual 
measurement to other knowledge. Consider, for example* the significant drop in the R 2 as a result 
of dropping the pre-CTA scores. 




Table 2 

Zero-Order Correlations of TUCE Part I with Scores on Selected Aptitude Measures 

(N = 649) 



TUCE 

Form A 
_ Pretest 
(X = 14.42, SD = 3.74) 



TUCE 

Form B 
_ Posttest 
(X = 19.96, SD = 4.01) 



1) Total ACT scoref .41 
(X - 24.10, SD = 3.56) 

2) Critical Thinking Appraisal 

(FormZM) .41 

(X » 70.10, SD = 9.24) 

3) Critical Thinking Appraisal 

(Form YM)ft .39 
(X* 71.32, SD = 7.83) 

4) Cumulative grade point average .30 
(X ~ 2.55, SD = 52) 

5) Grade in the Principles course 
(X=*2.41,SD = 1.07) 



.44 



.37 



.38 



.41 



.58 



fACT has a zero-order correlation coefficient of .52 for CTA (ZM) and .48 for CTA (YM). 
ft All post-CTA (YM) scores were converted into pre-CTA (ZM) equivalent raw scores [16. p. 8] 



TUCE and Construct Validity 

One measure of construct validity for TUCE may be logically inferred from the Test 
Committee's definition of the three subparts as presented in the Manual [11]. Moreover, 
recent research [I] has assumed the existence of identifiable subdivisions on TUCE and 
has consequently interpreted significant differences in the results on this basis. 

Evidence of construct validity can also be obtained from a study ot the ways in which 
the variou:, parts of the test arc related to each other and to the test as a whole. The 
interrelationships of the three subtests and the total test are reported in Table 3 from our 
data. The moderately low subtest zero-order correlation coefficients, ranging from .33 to 
.36, support the contention that relatively distinctive abilities are being measured with 
sufficiently small overlap to warrant their inclusion in one total score. 

Tab!* 3 

Correlation Matrix on TUCE and Subparts of TUCE, Interaction Terms included 

{N = 784) 



TUCE X, X 2 X 3 X 4 X. X 6 

(Form IB) R S C RS CS CR 



Recognition/Understanding (R) 


X, 


.75 










Simple Application (S) 


Xj 


.74 


.36 








Complex Application (C) 


X, 


.77 


.36 


.33 






R • S 


X 4 


.89 


.78 


.84 


.43 




C • S 


X, 


.91 


45 


.82 


.79 


.79 


C R 


X 6 


.92 


.79 


.43 


83 


.74 


C S • R 


X: 


.96 


.71 


.74 


.72 


.91 



Tabic 3 is corroborated by the following regression equation which include* 
multiplicative interaction terms from our data: 



Post-TUCE = 1.73 (constant) + .82 (R) + .76 (S) + .77 (C) 

(9.66) (7.9!) (8.88) 



+ .03 (RS) + .03 (CS) + .03 (CR) - .003 (CSR) 
(1.85) (2.41) (2.10) (-1.89) 



Adj./? 2 - .99 



The f-values are given between parentheses. 

Two of the pair combinations show significant interaction, but both the /-values and 
the size of the corresponding regression coefficients are small compared to the 
noninteraction terms. Obviously higher level interaction could be present, but the high 
R 2 doe* not seem to warrant further tests, at least not for predictive purposes. The 
recognition and understanding term (R) seems to be the most identifiable or "purest" 
part of post-TUCE; while this distinction is small, it is intuitively acceptable. One would 
expect application questions to have more transferability or overlap than recognition and 
understanding questions. 

A further measure of construct validity for the three subparts of TUCE can be 
gained by relating TUCE and its subparts to other types of measuring instruments such 
as the CTA discussed earlier in this paper. Regressions 1-8 in Table 4 present results from 
simple linear regressions with TUCE and its subparts on CTA. As both the regression 
coefficients and intercepts are significantly different for CTA in the regressions with the 
three TUCE subparts (regressions 3-8), it appears that separation of the three subparts of 
TUCE is valid in that they can be differentiated when referenced to the critical thinking 
skills measured by CTA. Although the regression coefficients for CTA with both simple 
and complex application types of questions (regressions 4 and 5, 7 and 8) have 
approximately identical values, the intercepts are significantly different. 



T«W«4 

F .v-tjrtsslon RMUltt with Pcat-TUCE on CTA 
(N = 649; /-ratio* In partnthtaw) 



Independent Variables! 



Dependent Variable 



Constantt Pre-CTA Post-CTA 



Adj. RH 



3 Post-Rccognitio-/ 
Understanding 

4. Post-Simple 
Application 

5. Post-Complex 
Application 

6. Post-Recognition/ 
Understanding 

1. Post-Simple 

Application 
8. Post-Complex 

Application 



2. 



Post-TUCE 



Post-TUCE 




f All significant at .05 level. 
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Notice also the improvement in significance obtained by regressing postvalues with 
each other, as compared to pairing pre- and postvalues. This is to be expected, as a likely 
outcome of the intervening educational process. Overall, the application categories of 
post-TUCE correlate more strongly with CTA than the recognition and understanding 
terms. This lends weight to the previous discussion of transferability . The recognition and 
understanding category measures more specific economic knowledge; the simple and 
complex application categories are perhaps more general in nature. 

Adding other significant terms to the regression, we obtain 6 : 

Post-CTA = 37.94 — .58 (College Math) + 3.80 (Reads Newspaper) 
(-2.02) (2.27) 

+ .57 (ACT Score) + .23 (Pre-CTA) + .24 (Pre-TUCE) 
(6.23) (7.56) (3.74) 

+ .65 (Post-Simple) + 40 (Post-Complex) 
(4.00) (2.73) 

Adj. = .39 

The f-values are given between parentheses. 

Thus, with post-CTA scores as the dependent variable and employing a step-down 
procedure with a .10 cut-off level, we found that "post-recognition and understanding" 
types of questions were dropped along with a number of other independent variables. 
Although "recognition and understanding" did not drop until quite late in the step-down 
procedure, it confirmed our expectations that these types of questions represent less 
general conceptual skills as measured by CTA. 

It is important to note that both pre-TUCE as a total score and the two post- 
subparts of "simple application" and "complex application" remained significant during 
the step-down regression procedure. Furthermore, simple applicatior with a regression 
coefficient of .65 is a stronger influence on CTA than is the .40 coefficient for complex 
application— again indicating that simple application types of questions have a higher 
association with critical thinking skills (as measured by CTA) than do the complex 
application questions. 1 The zero-order correlations show a slightly stronger relationship 
between (R) and (C) than between (S) and (C). Thus, the complex application questions* 
closer affinity to "pure economics" may remove it from general critical thinking skills 
more than the simple application questions. This is important to recognize to the extent 
that some researchers employing TUCE impute a higher general educational value to the 
complex application types of questions, assuming that they measure a higher (more 
transferable) reasoning ability. This may be true and they may have higher educational 



The significance of, and discussion fur, these regression results are given in another paper, "Criti- 
cal Thinking Skills in the Principles Course. An Experiment." Copies may be obtained by writing 
to the authors of this paper. 

Mt has sen sugges'ed that the lower and less significant regression coefficient for post-complex 
application questions results from the nature of these questions in that they cannot be answered 
without some prior knowledge involving "recognition and understanding" items— i.e., they are 
sequentially dependent upon specific recognition and understanding knowledge. It is further argued 
that in a multiple linear regression with all three TUCE subparts included, a portion of the "com- 
plex" vanable is already act 5unted for and held constant b, the "recognition" variable. If these 
conditions are true, then it is not surprising that the post-complex coefficient is lower and less 
significant. However, in the multiple regression above, the "recognition" variable has already 
dropped in the step-down procedure without significantly altering the constant or any of the other 
regression coefficients. Thus, the total independent (and less significant) condition of post complex 
cannot be rejected on this basis. 
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TaWs5 

Rogrtsslon Equations on Subparts of Critics! Thinking Skills 
(N ■ 649; f-ratlot In psrsnthssss) 



Independent Variables 
(Post-CTA Subparts) 



Dependent Xi X 2 X3 X4 X 5 

VarisMe Constantft Inference Assumptions Deduction Interpretation Evaluation Adj. fl*tt 



Post-TUCE(IB) 4.96 
Jfc- 19.96 

Recognition and 

Understanding 3.20 
X-6.90 

Simple Applica- 
tion 0.18 
£-6.05 

Complex Applica- 
tion 3.Ci 
X » 7.01 



fSignificant at .10 level. 
tfS.^nifiCant at .05 level. 
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ERIC 



.147ft .I20t t383ft 

(2.38) (1.47) (6.02) 

.062ft .014 .101ft 

(2.18) (.34) (3.45) 



.044f .067ft 150tt 

(1.64) (1.92) (5.45) 



.032 .021 1.48ft 

(.57) (.29) (2.56) 



.140ft 077 .14 

(2.02) (.90) 

.031 .006 .05 

(.97) (.14) 

.040f .05 It 12 

(1.35) (1.39) 

.025 —.021 .02 

(.39) (-.27) 




value, but any researcher using TUCE will have to rationalize this higher value on the 
basis of other criteria than those "critical thinking skills" or "reasoning abilities" 
measured by CTA. 

Tables 5 and 6 give additional insight into the types of critical thinking skills which 
are involved in resolving the problems and questions of the three TUCE subparts. 



Table 6 

Correlation Matrix for Post-TUCE and Poat-CTA Subparta 







(Form IB) 


X, 






X 4 


X. 


x 6 


X7 Xj 


Inference 


X, 


.19 
















Assumptions 


x 2 


.19 


.20 














Deduction 


X, 


.34 


.26 


.32 












Interpretation 


x 4 


.23 


.20 


.29 


42 










Evaluation 


x 5 


.15 


.11 


.14 


.30 


.23 








Recognition and 
Understanding 


x 6 


.75 


.14 


.09 


.20 


.13 


.08 






Simple Application 


Xr 


74 


.16 


.19 


.32 


.20 


.16 


.36 




Complex Application 


X 8 


.77 


.06 


06 


.13 


.07 


.03 


.36 


.33 


Tota' CTA (Form YM) 


.38 












.25 


.38 .34 



Note that student performance on "inference," "deduction," and "interpretation" 
as defined by the CTA are all significantly associated at a 5 percent level with postcourse 
performance on TUCE (regression 1). For example, 1 point on the "deduction" subpart 
of CTA contributed .38 point to TUCE. Eq'ially important are the results which show 
that student performances on TUCE are much less associated with their abilities to 
discriminate on the basis of "recognizing assumptions" or "evaluating arguments," 
measured by CTA. However, recognition of assumptions is significantly associated with 
answering "simple application" types of questions correctly (regression 3). Presumably 
the nonsignificant association of "assumptions" on total TUCE in regression 1 results 
from its 'ery low and nonsignificant associations with both "recognition and 
understanding" and "complex application" types of questions in regressions 2 and 4 , thus 
strengthening its significance with "simple application" from regression 3. 

The results of the zero-order correlations from Table 6 confirm these observations. 
"Deduction" has a correlation coefficient of .34 with TUCE, while ability to recognize 
assumptions (X>) correlates most highly with simple application (X?). 

A second observation from Tables 5 and 6 indicates that when we relax the 
significance level to 10 percent we note that simple application (regression 3) significantly 
associates with all five CTA subparts. The relatively high correlation coefficients in Table 
6 for simple application (X?) with the five CTA subparts (X t , X 2 , Xu X 4 , X5) also 
support the regression results. In every instance, the correlation coefficients for each of 
the five CTA subparts are highest for simple application and lowest for complex 
application. This further substantiates the regression results in Table 4 and the multiple 
regression discussed earlier wherein "simple application" types of questions associated 
more closely with those abilities measured by CTA than did either of the other two 
TUCE subparts. Again, it appears that "co.nplex application" types of questions do not 
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correlate closely with many of the critical thinking skills or attributes popularly 
associated with such questions [I I, p. 7].* 

A third observation to be noted from Tables 5 and 6 concerns the relative 
importance and association of "recognition and understanding" types of questions with 
critical thinking skills as measured by the CTA. Regression 2 significantly associates 
both inference and deduction abilities to performance on recognition and understanding, 
while only deduction associates with complex application in regression 4. These results 
are confirmed further in Table 6, where in every instance recognition and understanding 
(X«) has a higher correlation coefficient than complex application with the five subparts 
of CTA (X,,X 2 , X 3 , X 4 ,X 5 ). 

This significance and association of recognition and understanding types of 
questions with critical thinking skills confirms the TUCE Test Committee's rationale for 
this category [I I, p. 6]: 

Questions in this category need not (and should not) be answerable by rote 
memory. The better questions of this type involve restatement or recognition of an 
idea in somewhat different language from that in which it was originally learned. 
These questions may call for explanation, for summarization, or for simple 
extension of an idea. Thus, such questions can and should test understanding or 
comprehension rather than recall 
Clearly, as indicated by the cross validation with CTA, these types of questions can and 
do test understanding and comprehension rather than simply recall. 

However, as the Test Committee appropriately notes, . . the classification of items 
by objectives has been a priori, based largely on the judgment of the cooperating 
economists and some of their graduate students. Whether the actual mental processes of 
undergraduate students tackling these questions would correspond to the classifications is 
not surely known" (II, p. 7]. If (and assuming that) the CTA measures many of the 
things involved in the kind of "orderly thinking" called for by the National Task Force 
on Economic Education under the heading, "A Rational Wa^ of Thinking About 
Economic Problems" (12, pp. 14-17], then we must question whether complex 
applications do, in fact, measure these objectives [I I, p. 7]. Our data indicate that simple 
application has the most significant association with these objectives Even recognition 
and understanding associates more highly with critical thinking skills measured by 
CTA than does complex application. Suggestively, complex application types of 
questions either have to be changed, rationalized on some other basis, or subjected to 
further testing and scrutiny. 9 



"It has been suggested by Henry Viliard that the fact that complex application questions do not 
correlate as highly with the CTA as the other TUCE subparts may suggest that they. *n taU, have 
"special educational values and attributes.** It is argued that any bright student (as measured by 
the CTA) can handle simple application and recognition and understanding questions. However, an 
effective measure of economic understanding apart from intelligence (and whatever else the CTA 
measures) can be indicated by performance on complex application questions. This may be true. 
However, it again raises the problem of testing the assumption against another validated instru- 
ment for measuring whatever constitutes this facet of' economic understanding** or that compon- 
ent which complex application questions meas»j:c. Possibly the Graduate Record Examination or 
the College Level Examination Program ir. Economics of the College Entrance Examination Board 
could provide such a vehicle for testing this hypothesis. 

*ln an effort further to identify possible relationships within the 33 TUCE questions and between 
the TUCE and CTA measures, two factor analyses were performed. The first analysis included 
only the 33 questions in post-TUCE (Part I, Form B) while the second included the five CTA sub- 
parts with ine TUCE questions. In both analyses the resulting vanmax factor loading matrix (for 
the eleven and twelve factors, respectively) indicated that very little of cither the TUCE content 
categories or subpart classifications (1 1, p. 8] could be explained by the factor structures. The only 
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Concluding Comments 

As always in studies like this, one is well advised to resist generalizations based upon 
a single source of data; clearly the study needs to be replicated. Yet some tentative results 
do give important insights into the validity of TUCE both as a research and as a 
measurement instrument. 

This study shows that TUCE is not only effective in discriminating between "good" 
and "poor" students in economics, but that it also measures prior a*\ility and critical 
thinking skills while effectively discriminating on other knowledge. In addition, the 
subparts of TUCE, as developed by the Test Committee, can be shown to measure 
significantly different things. But, when associating the three subparts of TUCE with the 
critxal thinking skills measured by CTA, the most significant associations occur with 
simple and not complex application types of questions. Cur results indicate that 'he 
researcher employing TUCE must be cautious about imputing certain educational values 
and attributes to complex application questions. The recognition and understanding 
questions, however, were appropriately classified by the Test Committee in that they 
seem to measure elements other than just memorized items from economic texts. It is 
suggested that further validation studies should examine the contribution of the complex 
application questions of TUCE and assess 'heir role in measuring performance in 
economics. 
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Educational Production Function for 
an Introductory Economics Course 

Elisabeth Allison 



Harvard has been teaching introductory 
economics for 162 years: as a bianch of 
moral philosophy for 50 and as a separate 
discipline for 112. In intellectual terms the 
core of shared experiences for the 87,000 
students who have passed through the 
course is minimal: Adam Smiths Wealth of 
Nations is the only reading common to both 
courses. Over time, the course has evolved, 
following the development of modern eco- 
nomics. Moreover, today's course is not on- 
ly substantively different from its predeces- 
sor; it is demonstrably better. The student 
who passes Economics 10 today can answer 
many questions that puzzled students (and 
indeed their professors) in the 1870s— for ex- 
ample, Adam Smith's paradox of diamonds 
and water or Malthus's observation of 
"general gluts." 

The contrast between this steady, sys- 
tematic cumulation of knowledge in the 
discipline of economics and the noncumula- 
tive, essentially personal nature of knowl- 
edge about the teaching of economics is 
striking. In 1973— when the effort described 
in this paper began— it was difficult to ar- 
ticulate a single proposition about teaching 
economics effectively that was known to to- 
day's instructors but hidden from eight 
preceding generations of teachers. 

The failure to develop a body of substan- 
tive propositions about the teaching of 
economics was mirrored in every facet of the 
course. When cutbacks in the instructional 
budget had to be made in 1972, the decision 
to have larger classes with better-qualified 
instructors was based on precisely the same 



information on which the opposite decision 
had been made in 1888. No atterrpt was 
made to train new instructors because *here 
was no body of knowledge on pedagogy to 
be conveyed. With no propositions to be 
tested there was no point to the systematic 
collection of data on course inputs or out- 
puts. , 

This contrast was all the more ironic 
because the process was well suited to em- 
pirical investigation. The course had large 
numbers of students (800-900 per year), rich 
data from Harvard's elaborate admissions 
process, well-defined performance stan- 
dards, and a decentralized course structure 
that permitted "controlled" experiments. In- 
deed the situation offered the prospects of 
data far more reliable than those available 
for most economic analysis. Moreover there 
existed- within the discipline of economics — 
a body of sophisticated research in which the 
problems of modeling production processes 
and estimating their parameters had been 
well worked out and a set of extensions 
which suggested that this model had general 
applicability. 

Nevertheless, the contrast and the oppor- 
tunity it posed might well have passed un- 
noticed for another decade had it not been 
for two events. The first was a decision to 
experiment with a new pedagogy, self-paced 
instruction (SPI). 1 A years experience sug- 
gested that SPI altered the whole learning 
process, not just a few exam scores. But 
without a model of the basic process, it was 
impossible to quantify and evaluate those 
SPI effects. The second was a warning that 
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sharp cuts might be made in the instructional 
budget, cuts that would have to be made 
without any measures of their educational 
impact. Together with a certain intellectual 
curiosity these events were sufficient to 
generate the project described in this 
paper -a major attempt to analyze the pro- 
cess of education in the introductory 
economics course. 

This paper describes a major substantive 
study of economic education. The study in- 
cluded (1) construction of a three-equation 
model consisting of a "behavior function/' an 
equation describing student decisions about 
the allocation of time and effort; a "produc- 
tion function" relating student effort, ability, 
and pedagogy to achievement in economics; 
and a "profit function" relating student effort 
and achievement to student enjoyment of the 
course; (2) collection over a three-year 
period of data on 2,400 students required to 
test the validity of that model; and (3) 
estimation of that model with regard for the 
simultaneity and nonlinearity inherent in the 
educational process. 

The results of this project are interesting 
from several perspectives. They suggest, for 
example, that the maximizing model and 
standard technological assumptions of 
economics are useful for describing the 
educational decisions of college-age students. 
The results suggest that functions more 
general than Cobb-Douglas may be prefer- 
able when the data are sufficiently fine- 
grained. The elasticity of substitution can in- 
deed differ from 1. But, what is most impor- 
tant, is that this study has produced a 
plethora of robust and interesting proposi- 
tions about the educational process which 
speak to our original concerns. 

1. Today we have a set of general propo- 
sitions about the learning process in the 
course. For example: 

- For the average student, the "elasticities" 
of ability, pedagogic inputs, and effort 
are roughly .89, .40, and .25; 

- The largest influences on the time students 
devote to the course are tastes and out- 
side commitments; 

- Pedagogic inputs appear primarily to 
change the timing and mix of effort, 
rather than its absolute quantity. 

2. An evaluation of our pedagogic in- 
novations in detail reveals, for example: 



- Self-paced instruction increases student 
achievement as measured by standard 
course exams, but not student enjoyment; 

- Case-based instruction increases student 
enjoyment of the course but leaves 
achievement unchanged; 

- Self-paced instruction increases the mean 
time spent on the course slightly, but 
sharply decreases the variance. 

3. An analysis of the productivities of 
various educational inputs suggests that con- 
trary to the conventional wisdom, for exam- 
ple: 

- Neither age nor experience of instructor 
has a significant effect upon student per- 
formance; 

- Student evaluation of instructors' com- 
petence, pedagogic skills, and empathetic 
qualities explains none of the variance in 
student performance on standard tests of 
understanding of economics; 

- Some "objective" characteristics of in- 
structors-grades in graduate courses and 
hours spent preparing classes - explain 
about 20 percent of the variance in stu- 
dent performance; 

- Student evaluations of instructors' com- 
petence and pedagogic skills account for 
about 60 percent of the explained 
variance in student enjoyment of the 
course. 

This listing of results is not intended lo im- 
ply that the issues of economic education 
have been resolved by this analysis; indeed a 
more sophisticated mode of analysis in- 
variably creates new problems even as it 
resolves old ones. Nevertheless, the results 
should be sufficient to persuade conven- 
tional economists that the application of a 
maximizing model to problems of educa- 
tional choice is not just empty formalism, 
and to persuade economic educators that the 
costs of moving to a conceptually, empiri- 
cally, and statistically more sophisticated 
method of evaluation are far outweighed by 
the value of the propositions uncovered. 

The study is divided into five parts. Sec- 
tion I develops the model of educational 
choice. Section II describes the data and 
methods used to estimate the model. The 
results of estimating the model are presented 
in Section III. Section IV reports estimates 
fa* subgroups of special interest: very bright 
students, students of average ability, women 
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students, and self-paced students. Section V 
summarizes the findings, reservations, and 
unanswered questions. 



I. THE MODEL 

The model described below, while more 
elaborate than those usually found in the 
economic education literature, can be ap- 
proached intuitively as follows: If one were 
askec* :o evaluate the results of an educa- 
tional innovation or arrive at a judgment 
about the trade-off between intelligence and 
effort in an introductory economics course, 
an obvious first approach (and the one taken 
in most of the education literature) 2 would 
be to estimate a model of the form: 

(0) ACH = a. + b x ABlL + b t EFF 

+ bJEACH + bJ>ED + e 

where 

ACH = student mastery of economics 

ABIL = student ability: Scholastic Aptitude 
Test scores, math background, etc 

EFF = measures of student effort: attendance, 
percent of homework completed, etc. 

TEACH = instructor characteristics: experience, 
academic rank, preparation time, etc. 

PED = innovation: 1 if participating student; 
O otherwise 

and interpret the coefficients as marginal 
productivities of the various inputs. (A com- 
plete list of abbreviations with definitions is 
included at the end of this article.) 

Upon reflection, however (particularly if 
one were familiar with the production 
literature), it wmld be clear that such a 
model provided rather limited information; 
a single-equation model necessarily provides 
information only on achievement, not on 
other outputs such as enjoyment or concen- 
tration. It would be equally clear that if, for 
example, an hour's extra study time does not 
produce the same gain in mastery for every 
student regardless of ability or if study is 
subject to diminishing returns (i.e., if the 
assumptions of linearity and additivity do 
not hold), or alternatively, if effort and 



achievement are simultaneously determined, 
(i.e., if students work at what they're good at 
and are good at what they work at), then 
EFF and e would be correlated. If the 
assumption that a x , = 0 is violated, or- 
dinary least squares (OLS) estimates will be 
biased and inconsistent; OLS estimates of hi, 
for example, will overstate the gains to the 
average student from increasing his efforts in 
the course. 3 

Fortunately, both problems are manage- 
able. A natural first step is to deal with the 
simultaneity problem by the explicit introduc- 
tion of an effort equation which recognizes 
the interdependence of effort and achieve- 
ment/ The education literature does not 
contain a model of the allocation of effort; 
however, one can be derived from the theory 
of utility maximization. 

Assume that each student is a utility max- 
imizer, endowed with some set of abilities 
and tastes (assumed to be predetermined) 
and a technology defined by the teaching 
technology and quality of the instructor 
assigned to him. The student controls one in- 
put, effort. For the time being, assume that 
effort is allocated along a single dimension 
and that this effort produces a grade and 
consumption benefits in some fixed relation- 
ship, that is, that one hour of work yields a 
fixed proportion of utility and marginal pro- 
ductivity with respect to grade received; no 
trade-off between the two outputs is possi- 
ble. Time, a surrogate for effort, is allocated 
among three occupations: economics, all other 
courses, and all other activities. Utility is a 
function of the direct utility of the three time 
allocations and of the expected grade to be 
received in economics and other courses. 
(Thus we are in addition making an assump- 
tion of risk neutrality.) For an individual, let 
Ui = utility received by the ith student; 
T = student time; 0 = anticipated grade; 
EC = economics; AOC = all other 
courses; AO A = all other activities. Then 

Ui = fi(Tzct Taoc, Taoa, Ceci Caoc) 

Tec + Taoc + T A oa = 24 hrs/day 

Gsc — gi(T£c)' t Caoc = MTaoc) 

The marginal conditions are as follows: 

(dU,/dG E c) (dGtc/dTtc) = (dU</dT A oA) 
- (dU ( /dT EC ) 
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(BUt/dGAoc) (dG A oc/dT A oc) = WUVTma) 

- (dU t /dT AOC ) 

Time is allocated to economics and to all 
other courses until the expected utility of the 
increased grade derived from the final hour 
of study minus the disutility of the study 
itself equals the (net) marginal utility of an 
hour spent in each other way. In particular, 
then, we would expect those with a taste for 
study to work more, and those with a taste 
for other activities to work less. Those with 
a taste for high grades would study more and 
those who believe themselves more efficient 
at the margin in studying (ceteris paribus) 
would study more. 

Thus we have a behavioral function for 
the fth student which can be written as: 

[1] Effort = ft ([Is], C) 

where effort (an unobservable, to be approx- 
imated by homework hours, reading hours, 
attendance, etc.) depends on [Ts], a vector 
of student tastes and interests, together with 
C, the expected grade, an un observable, 
assumed to be a function of past grades, cur- 
rent performance, and available technology, 
as defined by pedagogical inputs available to 
the student. In addition, "institutional" fac- 
tors such as attendance requirements may 
enter directly. 5 [Ts] may be assumed ex- 
ogenous to any one cou r ;e. C, to the extent 
it depends on current as well as past perfor- 
mance, is endogenous. Institutional con- 
straints are, of course, exogenous. 

Thus a first-semester effort function can 
be written as: 

[1.1] EFF = fdTs^ATEACHlll:^ 1 a,G AOC , 
GRf c i° v ACH, [PED]) 

where 

EPP = effort devoted to the introductory 
economics course by tth student in 
the first semester 

[Ts] = vector of tastes: intelle aial, social 
etc. 

[TEACH] = vector of instructor characteristics: 
empathy, classroom skills, prepara- 
tion, etc. 

d = previous grades, other courses 



GRf'™ = previous grade in course 

ACH 55 current performance in course 

[PED] = educational inputs other than 
instructor 

It is not incompatible with a maximizing 
model to assume that in a two-semester 
course study habits of the first semester in- 
fluence choices in the second. The second- 
semester effort function then becomes 6 

[1.2] EFF 2 = f(EFF l [Ts], [TEACH], GRf c }° v ACH, 
[PED]) 

A natural second improvement is to relax 
the assumptions of linearity and additivity in 
the "production function," or achievement 
equation, 7 by choosing a more general func- 
tional form. 6 The CES specification, in par- 
ticular allows the effort-ability "isoquants," 
for example, to range from L-shaped curves 
to straight lines. Equation [0] then becomes: 

[2.1] ACH 1 = A(<x x ABlL~* + a 2 EFF~ & + 
aJEACH'*)- v * + cuPED 

where ACH 1 = index of student mastery of 
economics: test scores and grades in the first 
semester; ABIL - vector of student abilities: 
general intelligence, analytic ability, mathe- 
matical training, etc.; and EFF, TEACH, and 
PED are defined as before. Given that eco- 
nomics is a cumulative discipline, the second- 
semester production function becomes: 

[2.2! ACH 2 = McixACHy-* + a^ABll^ 

+ ctiEFF-t + cxJEACH- 3 )- 16 
+ otiPED 

Finally, the model can be expanded to in- 
clude what many consider to be an impor- 
tant product of an introductory course, 
namely, student satisfaction: not the answer 
to the question, "Are you having fun this 
minute?" but rather a more reflective answer 
to a question asked after the course, "Was it 
a worthwhile experience?" Such a satisfac- 
tion equation can be derived by returning to 
the maximizing model discussed earlier. 9 

Assume that students allocate their effort 
according to equation [1]. Given a student's 
achievement function, this effort (a cost) 
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results in some grade, GR* 10 (observable), 
and some amount of intellectual satisfaction, 
SAT (an unobservable). By our earlier assump- 
tion, GR £cl ° and SAT are joint products. 

Now we assume that, just as in the 
achievement equation students' effort was 
indexed by their ability, so, too, do they 
have different "happiness" technologies, 
defined jy their talent for happiness. H (to 
be measured by general satisfaction with col- 
lege life, etc.) and the educational inputs the 
course makes available to the individual stu- 
dent. Thus we can write: 

U = H(SAT„GR Ecl °) - g(EFF) 

where dU/dSAT > 0; dll/dGR* 10 > 0; dU/ 
dEFF < 0. We cannot observe SAT; however, 
if we assume that it depends on student tastes, 
we can rewrite the above as: 

[3] ENJ m f(GR Ecl0 ,lTs],H,[PED]) - g{EFF) 

Since there is no reason to believe that enjoy 
ing economics becomes a habit, the second- 
semester ^uation is identical to the first. 

The model described by equations [l] 
through [3] represents one solution to the 
problems inherent in the single-equation 
linear model. By explicitly recognizing the 
simultaneities and nonlinearities inherent in 
education, and by extending the model to in- 
clude students' decisions about the allocation 
of effort and judgments of satisfaction, it 
should be possible to arrive at both more ac- 
curate estimates of the marginal produc 
tivitie* of educational inputs and broad'.r 
evaluations of the overall ir^^ct ot any 
pedagogic innovation. Furthermore, since 
this model is merely a special case of the 
more general maximizing model, its useful- 
ness 5 .n explaining student behavior will pro- 
vide one more piece of evidence on the ap- 
propriate range of its application. 

II. ESTIMATING THE MODEL 

The model described by equations (1] 
through [3] can be climated in three steps. 
First, it is convenient to linearize equations 
[2]; nonlinear estimation techniques exist but 
they are expensive and artful. Taking 
natural logs and making use of the Kmenca 
approximation (i.e., an expansion of the 



Taylor series around )3 = 0) 10 equation [2] 
can be written as. 

In ACH = In a + a, In ABIL + a* In EFF 

+ a> In TEACH + a 4 SPl + ctsCMI 
- (l/2)a* (In ABIL - In EFFV 
+ e, 

where SPI = self-paced sections and CMI = 
case method sections. Intuitively, the 
squared term "corrects" for departure from 
Cobb-Douglas when ABIL and EFF take on 
extreme values. 

Second, having specified that ACH and 
EFF are simultaneously determined, an alter- 
native to estimating equation [2] via or- 
dinary least squares is required. Fortunately 
tastes and piece !ege career plans are highly 
correlated with effort but clearly exogenous 
to the course, as are Scholastic Aptitude Test 
scores and math preparation with achieve- 
ment. Thus consistent (and reasonable) esti- 
mates of equations [1] and [2] are obtained 
by applying two-stage least squares, i.e., by 
estimating 

EPF = a 0 + aAABIL] + a 2 [Ts\ + a 3 [TEACH] 
+ a 4 [PED\, 

replacing EFF by EFF in equation [2), and 
estimating equation [2l by OLS. This proce- 
dure is now legitimate since EFF is uncor- 
rected with the error term. 

The third step is to choose operational 
equivalents for each of the terms in equa- 
tions [1] through [3]. The body of available 
data, which included admissions data, data 
collected directly from students on "signed" 
questionnaires (with identification num- 
bers), and information from course and col- 
lege records, is very rich; conventions as to 
how many of the terms of the model should 
be measured are almost nonexistent. Thus 
experiments with data from spring 1973 were 
used to establish reasonable conventions for 
measuring tastes, effort, etc. These conven- 
tions were then adopted in 1974-1975 and 
1975-1976. In spite of this experimentation 
the errors-in-variables problem remains im- 
portant. Many of the observables that have 
the most explanatory power do not neatly 
correspond to one f^nn of the model. 

The data used are described briefly below. 



1. Student tastes and interests [Ts]. Direct 
information on student tastes and interests 
was obtained primarily from a specially 
designed section of the course questionnaire, 
which asked for priority rankings among 
lists of activities with responses ranging from 
"agree-disagree" to statements such as, "I feel 
there are more important things to do than 
study." Factor analysis was then used to 
identify major dimensions of interests: 
academic (ACINT), personal (PERINT), 
entrepreneurial (PROF/ENTR), political 
(POLITINT), and athletic (ATHINT), along 
which each students interest could be 
described. 11 Students were also asked for in- 
formation on their extracurricular activities. 

Variables used as proxies for tastes included 
race (MTY), year in college (FROSH), 1 * and 
sex. Variables used as proxies for the value 
of the expected grade included 5ECON 
(declared concentrator when entering course), 
PREMED, and a dummy P/F variable for 
students enrolled in the course on a graded/ 
ungraded basis. 

2. Instructor characteristics [TEACH], 
Data were obtained on four "objective" in- 
structor characteristics: intellectual ability, 
as measured by college and graduate school 
grades (TEACHGRADES); teaching experi- 
ence (TEX); intellectual experience as 
measured by the overlap of academic special- 
ization with course content (TFSPEQ; and 
assignment policies, measured by the quality 
and quantity of problem sets and handouts 
provided (#PS). Student views of their in- 
structors were summarized by two variables: 
a rating on the intellectual competence of the 
teachers' performance (TEACHCOMP) which 
was a weighted average of responses to the 
questions on "well-prepared?" "explains 
clearly?" "handles questions well?"; and 
(TEACHNICE), an average of responses to 
ratings on their human qualities, "accessible?", 
"fair in grading?" and "cares about your learn- 
ing?" 

3. Other instructional inputs (PED). 
Other than instructors, the major systematic 
differences in instructional inputs are the 
result of designating several sections as self- 
paced (SP1) or case-methpd (CMI), the two 
ongoing educational experiments. Both are 
entered as dummy variables. 

•V Grade: past (G„ CRf^° x ) f current, ex- 
pected. Data on students' previous grades are 
obtained from college records for upperclass- 



men and from admissions records for fresh- 
men. Previous course grade (G/?f c _ l V is not 
treated as a continuous variable. Harvard 
grading is notoriously nonlinear. The dis- 
tance between a B + and an A— is much 
greater than between a B and B + ; thus 
dEFF/dGRfi 0 ^ not necessarily monotonic. 
To allow for nonlinearities, past course 
grade was entered as a dummy variable: i.e., 
D5 d0 , D£ 10 , etc. 

5. Student effort [EPF]. Measures of stu- 
dent effort included attendance, percent of 
reading done, percent of homework com- 
pleted, hours of study time and, as reported 
by students on the questionnaire, "time spent 
on this course relative to other courses." 
These reports were cross-checked against 
course records to ensure consistency. Three 
proxies for effort were also used: SECON, 
P/F, and RPD, a dummy variable equal to 1 
if a student was not in class to receive a ques- 
tionnaire. 

6. Student ability and preparation 
[ABIL]. Direct measures of student ability 
used are: Scholastic Aptitude Test scores 
(VSAT, MSAT) ti secondary school teachers' 
ratings of a student's academic promise 
(RATES1), and a Harvard admissions rating 
on intellectual promise (RATES!) . Preo-llege 
plans to concentrate in the humanities 
(SHUMAN), high school math preparation 
(SUMS), race, and gender are used as prox- 
ies for the subtler components of ability. 13 
All ability data were obtained from admis- 
sions records under appropriate safeguards. 

7. Achievement \ACH). The central 
achievement measure (ACH) was the score 
on the multiple-choice portion of the 
semester final. This is a sixty-minute, forty- 
question security exam (i.e., no copies are 
allowed to leave the room), some portion of 
which is re-used. First-semester coverage in- 
cludes supply and demand, price theory, 
L'jor, industrial organization and public 
finance. The second-semester exam covers 
macroeconomics, trade, and comparative 
systems. Questions from both the College 
Level Exam of Economic Understanding 
(CLEU) and the Test of Understanding in 
College Economics (TUCE) are included. 
Over the last three years the correlation be- 
tween scores on the multiple-choice portion 
of the exam and the two-hour essay portion 
have ranged from .74 to .85. 

8. Student satisfaction [ENJ. SAT]. Stu- 
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dent responses to the question, "How worth- 
while did you find this course?" are the chief 
indicator of student enjoyment of the course 
(ENJ). Students' general happiness with col- 
lege life was measured along two dimensions. 
One was intellectual involvement (INTINV), 
measured by responses to questions such 
as, "I would prefer all courses pass/fail." 
The other was satisfaction with college life 
(COLSAT), measured by responses to ques- 
tions such as,, "Harvard-Raddiffe has lived 
up to my expectaions." 

In fitting the model these data were used in 
two forms. An overview of the process is 
best achieved by using one overall index of 
each input or output. Thus for the "forest" 
version of the model, factor analysis was used 
to construct synthetic "ability," "effort" and 
"instructor" variables from the many partial 
indexes. The loadings are shown below (see 
list of abbreviations for definitions): 



"Ability" 


"Effort" 


"Instructor" 14 


VSAT 


HOURS 


TEACHGRADES 


MS AT 


ATTEND 


TEACHEX 


SUMS 


% READING 


TFSPEC 


RATES 2 


RELTIME 


TEACHCOMP 


HATES 1 


RPD 




P/F 






SEX 






RACE 







This procedure, which extracts the "pure" 
component of each observable, also reduces 
the errors-in-variables problem created by 
observables such as P/F, year, and gender 
that do not correspond precisely to any one 
theoretical term in the model. 15 

Improving a course, however, requires 
knowledge not of the return to a metaphysical 
"effort" concept, but estimates of how much 
improvement an extra problem set will pro- 
duce. To produce this managerial informa- 
tion, a "tree" version of the model, in which 
numerous partial indices of each theoretical 
term were entered as independent variables, 
was also estimated. 



III. RESULTS 

The two-stage least-squares estimates of 
the effort and achievement equations for 
1975-1976 are presented in Tables 1, 2.1, 



2.21, and 2.22. In addition to the "forest" and 
"tree" versions described above, equations 
[2] were also estimated in linear form to allow 
comparisons with other studies in economic 
education. The ordinary least squares esti- 
mates of the enjoyment equation are pre- 
sented in Table 3. Results from earlier years 
are not presented; discrepancies, if any, are 
noted for each equation. The discussion of 
the results presented in each table is organized 
around the three central purposes of this ex- 
ercise: the search for general propositions 
about the process of economic education, 
the examination of narrower pedagogic 
issues about the effects of educational ex- 
periments and instructor characteristics; and 
the resolution of the methodological ques- 
tions of the usefulness of an economic ap- 
proach to education 

A. Effort (Equations [1.1] and [1.2]) 

1. General propositions. The results 
presented in Table 1 suggest three proposi- 
tions about the effort allocation process. The 
first is the important? of students' tastes 
{INTINV, PROF/ENTR, etc.), in their deci- 
sions as to how to allocate cheir time. Stu- 
dent interests, proxies for student interests, 
and involvement in extracurricular activities, 
including term-time employment, explain 
roughly twice as much of the variance in ef- 
fort as any other group of variables. 

The second proposition concerns the role 
of current performance and previous grades. 
The coefficient on previous grades, G*? c v 
whether at Harvard or in secondary school, 
is robustly insignificant. The coefficients on 
previous course grades and on VSAT, MSAT, 
and SUMS (instruments for predicting cur- 
rent achievement) are consistently negative. 
Furthermore, the higher the recent or current 
grades, the lower the level of effort The old 
maxim, "give them an A, and they'll stop 
working" seems to have some statistical 
validity. A comparison of results for the first 
and second semesters provides additional in- 
sight into the complex motivational role of 
course grades. Early (i.e., in October) aB- 
is a motivating grade: B— students work 
harder, ceteris paribus, presumably in hopes 
of becoming A- students. However, students 
who receive a B- at the end of the first se- 
mester work less during the second semester;, 
it is possible that they perceive the nonlin- 
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TABLE 1 Effort; All Students 



First Semester 



Second Semester 





Coeff. 




f-Ratlos 


Coeff. 




fRatio* 


SATCOL 


-.0416 




(.081) 


.0682 




(.431) 


INTINV 


.6261 




(3.80) 


3196 




(.872) 


PERtNT 


.1029 




(l.oUU) 


.1512 




(1.239) 




-.3315 




(1.506) 


-2.082 




(.640) 


rnvr/cii i n 


- .6769 




(5 150) 


-.1579 




(.267) 


ATHLETICS 


.0473 




(.013) 


.1917 




( 258) 


wot Tihirz 


.8428 




(2.802) 


- 1693 




(.184) 


/DO 
JUD 


- .8347 




(3.153) 


.0082 




(.000) 


EXTRA 


-.7598 




(2.291) 


1042 




(.072) 


MTV 


.0576 




(.010) 


- 2488 




(.185) 


FROSH 


.5409 




(2.173) 


2699 




(.492) 


SEX 


-.2073 




(.258) 


0432 




(.008) 


SECON 


1 .2938 




/in oco\ 

(1U.£00) 


770.fi 




(4.629) 


P/F 
r/r 


- 2 8279 




(22 76) 


-1.722 




(3.731) 


PREMED 


1165 




(3.98) 


.0776 




(1.78) 


TEACHCOMP 


-.1889 




(1.800) 


-.0614 




(.497) 




-.1221 




(.782) 


.3217 




(007) 


tp Art-tap ahf^ 


- 0197 




(.009) 


.0297 




(.084) 


MPS 


1901 




(.687) 


-1 333 




(1.377) 


SPt 


1.2976 




(6.371) 


2987 




(384) 


CM! 


1598 




(86) 


0017 




(.091) 


A 


rvnoc 
.UUoo 






0163 




(.837) 


oo*?n 
— .dc(\j 






- .2884 




(.364) 


B- 


2888 




(1-317) 


- 7851 




(2 964) 


C-,C.C + 


1.1989 




(4 911) 


-.1014 




(.022) 


D, E 


1 flnfl 
. l ouo 




M 1<tt 


- 4184 




(313) 


VSAT 


— UUJ4 




\ l ,yo*t/ 


0006 




(.046) 


kAQA T 


- 0078 




(8.417) 


.0041 




(1.723) 


SUMS 


- 4299 




(3 079) 


- 5827 




(4.888) 


SMART 


-1.1172 




(2.083) 


1574 




(.033) 


EFF % 








.6369 




(103.154) 


Constant 




18 163 






2.078 




R 2 




.3547 






.5027 




SEE 




2.8656 






2.619 





earity of Harvard grading and resign them- 
selves to being on the wrong side of the great 
divide. 

The third proposition is the importance of 
habit. The correlation between first- and 
second-semester effort is .64. Given the first- 
semester effort, estimates of the effort equa- 
tion the first-difference form reveal that 
only a change from graded to pass/fail status 
appears to have a substantial impact on stu- 
dent erfort allocation. 

2. Pedagogic issues. The results with 
respect to the pedagogical questions are mixed. 



The coefficients on both problem sets assigned 
and SPI are positive, although the coefficient 
of problem sets is significant only when the 
variable is entered in log form. Examination 
of the distribute a of effort scores among the 
self-paced group indicates that this dif- 
ference is due primarily to the compression 
of the left-hand tail. Apparently past some 
point (in study time, about three hours per 
week), the demands of SPI or additional prob- 
lem sets lead to substitution among varieties 
of effort; structured assignments drive out 
unstructured reading. The coefficient of the 



'A- 



case-method dummy is consistently insignifi- 
cant. This result may simply be a reflection 
of the small number of cases assigned each 
term rather than any general proposition 
about the labor intensity of the case method. 

The coefficients on all instructor attributes 
are generally small and insignificant. This is 
surprising. A student who draws a superior 
teacher can learn the same amount with less 
effort. This "income effect" might lead the 
student to work less. There is also a substitu- 
tion effect, since tl.« price of a given grade in 
terms of effort is lower in the course taught 
by the superior teacher. It seems implausible, 
however, that the substitution effect and the 
income effect should cancel precisely. Yet 
this result emerges in all three years. 

3. The maximizing model. These re- 
sults are generally encouraging with respect to 
the question of the appropriateness of a max- 
imizing model. The taste coefficients have 
the "right" signs; would-be entrepreneurs, 
aspiring lawyers, and students with strong 
academic interests work more; those with 
strong personal concerns work less. The 
coefficient of SECON is large and positive; 
the coefficient of P/F is strongly negative; 
students planning a concentration in econom- 
ics work more, while those taking the course 
pass/fail work less. Students, who have got- 
ten an A (and thus in some sense 'over in- 
vested") correct their behavior and work less 
in the next round. Students who choose to 
take the course pass/fail during the second 
semester reduce their efforts on all dimen- 
sions: attendance, reading, and study time. 
Perhaps the major departure from what a 
pure maximizing model would have predicted 
is the apparent importance of first-semester 
habits during the second semester. This may, 
however, simply reflect the imperfections of 
the taste measures or stability in relative 
returns over the course of the y<;ar from 
other courses and other activities. 

: B. Achievement 

1. General propositions. Of the re- 
sults reported in Tables 2.1, 2.21, and 2.22, 
four are particularly informative. 

a. The importance of learning to learn. In 
the begini ing, i.e., during the first semester, 
the coefficients of ability (AB1L), student ef- 
fort (JEFF), and instructor ( TEACH) in Table 
2.1 are, respectively, .84, .04, and .39. Inter- 
preted as elasticities, these coefficients sug- 
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gest that a 1 percent increase in ability has 
roughly ten times the effect of an equai in- 
crease in effort. However, during the second 
semester the effort coefficient rises to .18: 
meaning that an extra effort unit (more 
study time, wore reading, and more frequent 
class attendance) is more productive during 
the second semester than the first. 

To some extent this increase may merely 
reflect the use of a better ability measure 
(i. first-semester achievement) rather than 
the students' acquisition of a better learning 
technology. However, the results estimating 
the detailed version of equation (2) (Tables 
2.21 and 2.22) suggest that learning how to 
learn does occur. The pedagogic and instruc- 
tional variables are uniformly less significant 
during the second semester. The coefficients 
of instructor grades and numbers of problem 
sets are insignificant and occasionally per- 
verse. Self-paced instruction, which raises 
scores by about 15 percent in the first semes- 
ter, has no significant effect in the second. 
When effort is entered in disaggregated form, 
the coefficients of the stiuctured time vari- 
ables - class attendance and problem sets — are 
large and significant during the first semes- 
ter; the coefficient of MRS is very small: an 
extra hour of study time is worth .10 points 
on the exam. By the second semester, how- 
ever, an extra hour of study time yields .27 
points; HRS explains about 15 percent of the 
variance in achievement, while ATTEND 
and §PS become insignificant. 

Obviously thr finding is open to other in- 
terpretations. Nevertheless its implications 
concerning the scope for better pedagogy 
and for the allocation of instructional 
resources are profound. 

b. The multiple dimensions of ability. 
When the composite ability variable (ABIL) 
is replaced by its components, VSA T, 
MSAT, SUMS, SHUhAAN, etc., the R 2 of 
the equation (corrected for degrees of 
freedom) improves sharply. It appears that 
the "ability" relevant to achievement in 
economics is multidimensional, including, at 
a minimum, general verbal skills, mathe- 
matical aptitude, and mathematical prepara- 
tion. Several clues suggest a fourth dimen- 
sion. Even after formal mathematical prepa- 
ration, mathematical aptitude, and effort are 
controlled for, students who intended to con- 
centrate in humanities, women, and pass/fail 
students score lower. One can read into 
these results the existence a fourth dimen- 
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TABLE 2.1 Achlwtment; "Forest" Version (Log Form) 



First Semester Second Semester 





coeff. 




f Da tin* 


Coeff 


f.Ratios 


ABIL 


8419 




115 14 


8552 




13 006 


EFF* 


.0401 




2249 


.1764 




8.068 


TEACH 


,3372 




8.827 


.0961 




1.687 


SPI 


0786 




6.987 


0028 




002 


ACH* 








1.0896 




79 872 


(ABIL - EFPY 


0402 




761 


0232 




1.108 


Constant 




0938 






3.4491 








3351 






.4326 




SEE 




20397 






4233 





TABLE 2.21 Achievement; All Students; "Tree" Version (Log Form) 



First Semester Second Semester 





Coeff. 


t-Ratios 


Coeff. 




f.Ratlos 


VSAT 
MSAT 
SUMS 


5091 
5422 
0710 


(40,11) 
(29.711) 
(7 415) 


6499 
7026 
- 0316 




(7 378) 
(6.0C4) 
(.135) 


SHUMAN 
SEX 
MTY 
FROSH 


-.1129 
- 0073 
.0197 
.0167 


(13C68) 
(.112) 
(-376) 
(.696) 


- 0856 
-.1073 
0939 
0311 




(.929) 
(2.717) 
(1 087) 

(282) 


HRS f (hours. 0-20) 

ATV 

RPD 


.0201 
- 0713 
.0644 


(.867) 
(11 961) 
(9 324) 


1502 
0172 
.0421 




(2.132) 
(638) 
(1.941) 


TEACHCOMP 

TEACHGRADES 

*PS 


- 0073 
.3374 
.0756 


(153) 
(6.765) 
(5.780) 


-,0251 
2017 
0155 




(.227) 
(1 871) 
(1 368) 


PIF 
SPI 
CMI 


-.1210 
0787 
0001 


(2.938) 
(.7448) 
(008) 


- 1909 
0089 
0012 




(3 398) 
(012) 
(.000) 


ACH X 






1 0415 




(60.622) 


Constant 

R 1 

SEE 




4633 
C793 
.1939 




9.3006 
4928 
4029 





sion of ability: a talent for the peculiar 
moael-building that is the heart of economics. 
If in addition one cared to postulate that 
students know something about their analytic 
ability, a puzzling result of the last section 
could be explained. Students who know them- 
selves lacking in this analytic ability (as op- 
posed to general intelligence) take the course 
pass/fail. It is then not surprising that dif- 
ferences in past performance not related to 
analytic ability, i.e., past grades in human- 



ities courses, have no effect on the effort 
allocation decision. 

c. Returns to scale. These results suggest 
hat there are increasing returns to scale; that 
if ability, effort, and instructional quality 
are increased simultaneously, achievement 
will increase more than proportionately. 
This is somewhat surprising; it suggests the 
introductory course has scope enough to oc- 
cupy even very bright students for a year. 

d. Substitution possibilities. To some ex- 
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TABLE 2-22 Achltvement; All Students; "Tree" Version (Linear Form) 



First Semester 



Second Semester 





Coftff 

VVWlIt 


i no iiv9 


Coeff. 


f-Ratlos 


VSAT 


.0206 


(48 509> 


0178 


(16.536) 


liC A T 


.U&4 1 


\4y.y4y/ 




/ft 


SUMS 


Q007 


/Q "7/n \ 
(o.7U1 ) 


ncc7 
.Uob* 




SHU MAN 


-2.5129 


(12.718) 


-.7753 


(593) 


ocX 


4UOO 


/ ftp ft \ 


— £ UjDO 


(8.443) 


MTY 


.bo<£4 


/ OQQ\ 




\ i . # yo/ 


FROSH 


4040 


{ 1 WO) 


— 4 J/ y 


I o<:oj 


HRS f 


.0990 


(630) 


.2706 


(8 459) 


A TTV 

AT V 


71 QQ 

— / ioy 


/1 1 99\ 


91 ft/1 
.cl 04 




RPD 


1 *30CO 

i . J^oy 




. jjOO 


V OfD) 


TEACHCOMP 


.0090 


(001) 


- 0391 


(455) 


TEACH NICE 


-.0002 


(000* 


.0021 


(002) 


TEACHGRADES 


.7769 


(9.534) 


2430 


(455> 


tPS 


.4oyy 


(1 1 b4U) 


— 1 4oy 


{ O 1 4 J 


P/F 


— 0.04/ / 


\o 4ooj 


— 1 0404 




SPt 


2 210 


(7 510) 


4689 


(25) 


CM! 


1183 


(731) 


0639 


(087) 


ACH 1 






8126 


(166.2) 


Constant 




15 4330 




26 873 


fl 1 




.4346 




.5792 



tent knowledge of the "elasticity of substitu- 
tion" among educational inputs is less helpful 
than in other processes; students cannot 
"move along" an ability-effort isoquant, nor 
can they recruit brighter instructors. Never- 
theless/ it is instructive to identify several 
such isoquants that show relative marginal 
productivities of ability and effort. They are 
relatively steep. 

Questions of substitutability among other 
inputs are explored at length in Section IV 
where the assumption of constant elasticity 
is relaxed. 

2. Pedagogic issues. The results of 
the pedagogical and instructional variables 
are somewhat limited by the assumptions of 
the model, in particular the assumption that 
d*ACH/dABILdTEACH > 0. Neverthe- 
less, two pedagogic findings are among the 
most surprising of the entire study. The 
first is that instructional inputs — pedagogical 
methods and instructor characteristics— do 
matter. The coefficients of SP1 and TEACH- 
GRADES are consistently positive and 
significant. This result may imply that the 
production function approach is most valu- 



able when the data set has been disaggregated 
to the point where problems of differing ob- 
jective functions among instructors are less 
acute. 

The second set of surprises concerns the 
winners and losers among instructional in- 
puts. Self-paced instruction appears to have 
a significant effect on student learning dur- 
ing the first semester. Ceteris paribus, self- 
paced students have scores about 12 percent 
higher than students in conventional sec- 
tions. Students using the case method do not 
score better than their counterparts in con- 
ventional sections. 16 The self-paced coeffi- 
cient is small during the second semester, 
probably reflecting both the fact that only 
six weeks of the second semester are self- 
paced and the increasing ability of students 
to learn economics on their own. 

Similarly, the value added of a good in- 
structor, particularly during the Tst 
semester, is significant. For a student of 
average ability who puts the average effort 
into the course, a switch from the worst to 
the best instructor has a predicted value of 
about 4 points on an exam with a mean score 
of 21, or the equivalent of 200 SAT points. 



What matters about an instructor, 
however, are graduate school rades, assign- 
ments per student, and (in the single year for 
which such data are available) self-reported 
hours of preparation. These three character- 
istics have a significant correlation with stu- 
dent achievement. At the mean, for example, 
each additional point in the grade point 
average of the instructor translates into 
roughly an increase of .75 point in a 
student's score on the final exam. Student 
ratings of instructors on human qualities 
(TEACHNICE) or intellectual competence 
(TEACHCOMP) are uncorrected with stu- 
dent achievement. This finding is not sen- 
sitive to the reweighting of ratings, or to the 
grade point average of students whose ratings 
are used. It is consistent over the three-year 
period. Neither instructor's experience (TEX) 
nor professional specialization is significant. 
The former is perhaps attributable to the 
narrow range of experience offered by this 
group, but it is somewhat surprising that in- 
structors are not more effective the second 
time around. 

These results do not suggest that instruc- 
tors do not matter., or that students' reac- 
tions to their instructors are unimportant. 
The results of the enjoyment equation, for 
example, imply that the instructor is the 
largest single factor affecting a student's en- 
joyment of the course and choice of concen- 
tration. But these findings do suggest that 
student opinions about their instructors 
reflect factors other than their instructors' 
contributions to their learning when learning 
is measured by exam performance. 

A third, less startling but equally impor- 
tant finding is implicit in the discussion of 
the previous section. When students begin 
their study of economics, there is very little 
they can do to help themselves. The marginal 
productivity of the input over which they ex- 
ercise exclusive control — study hours — is 
very low. Given that in the short run their 
abilities are fixed (although over time they 
could increase their level of mathematical 
preparation), the only inputs that have a 
significant effect on their performance -at- 
tendance, the number and quality of prob- 
lem sets assigned, and the ability of the in- 
structor—are almost entirely out of their 
control in the typical course structure. 

3. The maximizing model. At a 
general level the results presented in Tables 



2.1-2.22 are consistent with a maximizing 
model of student learning. All the inputs 
have positive signs; their magnitudes and 
changes over time seem plausible. Slightly 
over half the variance in exam scores is ex- 
plained in the most successful equation, 
which is probably close to an upper bound, 
given the reliability of the exam. Two find- 
ings, however, suggest that the neoclassical 
model is not entirely applicable. The coeffi- 
cient of the correction term (ABIL - EFF)\ 
while insignificant, was generally positive. 
This suggests that this process does not ex- 
hibit diminishing marginal returns, i.e., that 
the ability-effort isoquants might be straight 
lines. Section IV presents a set of regressions 
that explores this problem in greater detail. 
Second, the evidence that there is "techno- 
logical progress" in learning economics raises 
the possibility that the estimated coefficients 
represent "average" productivities based 
partly on students who are not transforming 
effort and ability into achievement efficiently. 

C. Enjoyment 

1. General propositions. Estimates 
of the enjoyment equation are presented in 
Table 3. They suggest three observations. 
First, the typical student arrives at his 
evaluation of Economics 10 by a fairly 
straightforward process. He asks himself 
three questions: (1) "What grade did 1 get?" 
(2) "Did my instructor run a good class?" (3) 
"Was my instructor nice?" The three vari- 
ables that answer these questions: GR Ecl °, 
TEACHCOMP, and TEACHNICE account 
for 70 percent of the explained variance in 
students retrospective enjoyment of the 
course. Amcag those three variables the en- 
joyment elasticities of TEA CHNICE are slight- 
ly higher than teachers' grades. 

Second, as in the effort equation, student 
tastes count, explaining about 25 perce it of 
the variance. Students interested in entre- 
preneurship enjoy the course more and those 
interested in "people" enjoy the course less. 
Variables that are proxies for tastes— gender, 
race, and intended concentration— are also 
significant. Tastes appe<,- to develop over 
time. The second-semester coefficient of 
gender is large and negative; women appear 
to reflect on the course less fondly over time. 
Conversely, minority-group students like it 
better. The decrease in the size of the 
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TABLE 3 Enjoyment; All Students (Log Form) 

(range, 1-7; 1 = high) 



First Semester 



Second Semester 





Coeff. 


t-Ratios 


Coeff. 




f-Ratios 




1 17 A 


\ l.4Uo; 


OQCO 
— £900 




\0 DO/ 


INTINV 


2020 


(1 617) 


0107 




(004) 


PER INT 


1071 


(.313) 


-0179 




(008) 


ATHINT 


.0202 


(.034) 


- 0797 




(.384) 


POLITINT 


- 0801 


(415) 


.0773 




(471) 


PROF/EN TR 


.3328 


(5 563) 


.3566 




(6 371) 


MTY 


-.3890 


(2.195) 


-.5527 




(4.42) 


SEX 


- 0186 


(010) 


- 6183 




(8.909) 


SECON 


-.7057 


(12.89) 


- 2460 




(1 715) 


TEACHEX 


0014 


(.119) 


- 0021 




(287) 


TEACHCOMP 


.0054 


(8.003) 


0698 




(8 751) 


TEACH NICE 


1 142 


(11.261) 


0981 




(5 44) 


TEACHGRADES 


- 0529 


( 287) 


- .0563 




( 239) 


IPS 


.0327 


( 347) 


1351 




(5.077) 


SPI 


- 1511 


(395) 


- 1281 




(309) 




— .41 17 








/O «5 1 Q\ 
|t O 1 3/ 


P/F 


-.0938 


(113) 


- 1325 




(1S9) 


EFF 


U^ob 


( 917) 


- 0685 




(7.72) 


ACH 


- 0074 


( 135) 


0063 




( 167) 


SEMESTER 


-.0799 


(2.94) 


-.1654 




(13.41) 


GRD (0-15) 












NOV GRADE (0-15) 


- 0913 


(5 840) 








Constant 


3 5100 






4.725 




R 2 


.3509 






3661 




SEE 


1 2503 






1 3241 





SECON coefficient in the second semester 
may reflect a taste for the more analytic 
micro material covered in the first term. 

Exam score (ACH) has no independent ef- 
fect on enjoyment; apparently the major 
source of satisfaction is not in knowing, but 
in having one's instructor know that one 
knows. 

2. Pedagogic questions. The most 
interesting results about instructional inputs 
emerge from a comparison of Tables 2 and 3. 
Precisely those variables, TEACHN1CE and 
TEACHCOMP, which are consistently in- 
significant in the achievement equation have 
large and irignificant coefficients in the 
enjoyment liquation. Conversely TEACH- 
GRADES arid #PS are insignificant in the en- 
joyment equation, but significant in the 
achievement equation. Both self-paced and 
case-method instruction exhibit a similar 
pattern. SPI increases achievement but not 



enjoyment even after controlling for addi- 
tional effort; the converse is true for CM1. It 
is interesting that while still significant, the 
coefficient of CMI is about one-third its size 
in the first year of the experiment. 

These results do not imply that there is a 
trade-off between student achievement and 
enjoyment. Rather, enjoyment and achieve- 
ment are simply "produced" with different 
instructional resources. If additional peda- 
gogic inputs are costly, careful thought 
should be given to the achievement-enjoy- 
ment package to be produced. 

3 The maximizing model. The impli- 
cation of these results for the appropriateness 
of the economic model is unclear. The taste 
and achievement variables have the expected 
signs. Contrary to our original hypothesis, 
however, there appears to be no significant 
negative relationship, ceteris paribus, be- 
tween the effort a student reports investing 
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in the course and his enjoyment of the 
course. In the first semester the relationship, 
although negative, is insignificant; the 
second-semester coefficient although small, 
is perverse. (However, when the equation is 
estimated in first-difference form, i.e., when 
AEN] is the dependent variable, EFF has the 
correct sign. ) It may be that for some students 
enjoyment is the "plug" factor; given his 
allocation of time, a student in retrospect ad- 
justs his evaluation so as to make it rational. 
Alternatively, the assumption that an hour's 
effort produces identical grade-enjoyment 
bundles for all students may be inappropriate. 17 

IV. SUBREGRESSIONS 

This section presents estimates of the 
model for three pairs of subpopulations: 
students of above- vs. below-average ability, 
men vs. women, and self-paced vs. conven- 
tional-format students. It has two purposes. 
The first is to resolve some of the issues raised 
in the previous section, particularly those of 
the substitution possibilities and scale factors 
among inputs. The second is to demonstrate 
the power of this model by attacking two 
questions that have been explored at length 
in the economic education literature with the 
standard single-equation model: (1) the ef- 
ficacy of self-paced instruction and (2) the 
consistently inferior performance of female 
students in introductory economics courses. 

A. Unresolved Issues 

The results discussed in Section III left 
unresolved three questions with major im- 
plications for the allocation of instructional 
resources. One is the elasticity of substitu- 
tion between ability and effort, where a * 
tempts to test the hypothesis thai it differed 
significantly from i were inconclusive. A sec- 
ond was the appropriateness of the assump- 
tion of positive cross products: that the 
marginal product of better instructors was 
higher for bright students than for duller 
ones. 18 The third was the finding that this 
process was characterized by increasing returns 
to scale, a finding that runs counter to both 
the standard economic assumptions and much 
of the education literature. 

Fortunately, each of these questions is em- 
pirically testable. If at extreme values the 
elasticity of substitution between effort and 



ability decreases, the coefficients of AB1L 
should be lower for the brighter student, and 
those of EFF lower for the less able student. 
If the assumption that (d 2 ACH/dTEACH 
dABIL) > 0 is appropriate, the coefficients 
of the instructor variables (TEACHGRADES, 
#PS, etc.) should be larger the brighter the 
student. Finally, if the process is characterized 
by increasing returns to scale, the coeffi- 
cients on all variables must be larger for the 
most able than for the least able. Thus each 
of these issues can be investigated by re- 
estimating the model for the top and bottom 
quarters: roughly those with MSAT scores 
above 725 and below 650. 

Those results are presented in Tables 4.1, 
4.21 and 4.22. Results for the instructor 
characteristics included in this analysis are 
surprisingly (and somewhat sadly) suppor- 
tive of the assumption of positive cross prod- 
ucts. In particular, brighter instructors seem 
to be much more useful to brighter students. 
The coefficient of TEACHGRADES is 
positive and significant for the top quarter. 
TEACHGRADES is negative but not signif- 
icant for the bottom quarter. Similarly, prob- 
lem sets seem to be more useful to the 
brighter student. In contrast the SPI format 
appears to be considerably more productive 
for tiie bottom quarter. The coefficient of 
SPI for the high group is not significant: for 
the bottom quarter it appears to add about 3 
points to their exam score, or the equivalent 
of 150 points in SAT scores. 

The regressions provide no strong evidence 
of increasing returns to scale over the entire 
range. The sum of the coefficients for the 
brightest and the leas* bright students is 
slightly below 1, suggesting a region of con- 
stant returns. 

The issue of substitution is more compli- 
cated. The coefficient of AB1L is lower for 
the brightest students, as predicted, although 
comparisons of the coefficients of SUMS and 
VSAT suggest that some components of in- 
telligence may not be subject to diminishing 
returns. The coefficients of EFF and HRS, 
however, are larger for the bottom quarter. 

Finally, as a by-product these r" suits ied 
some light on two puzzles. The first is the 
finding that instructors have little effect on 
the allocation of student effort. At the lower 
end of the distribution, objective instruc- 
tional quality appears to have a negative ef- 
fect on effort. The bottom quarter is sen- 
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TABLE 4.1 


Effort; Students Grouped by Ability 








(top = MSAT above 725; bottom = 


MSAT below 650) 








Top Quarter 


Bottom Quarter 




UOOli. 


r naif 99 


1/06TT. 


r n8II0$ 


INTINV 


0.4089 


(0.425) 


0 557 


(0,686) 


PERINT 


1 .2035 


(2.209) 


-2.2716 


(3.860) 


POUTINT 


-0.5839 


(1.324) 


-0.4896 


(0.716) 


PROF/EN JR 


0.2146 


(0.148) 


— 0 24751 


MS 458) 


ATHLETICS 


0.0754 


(0.009) 


0 2841 


(0.101) 


WRITING 


1.7723 


(3.317) 


0.9484 


(0.766) 


JOB 


-0.9811 


(1.150) 


— 01 7Q7 




EXTRA 


-1.0671 


(1.474) 


-1.2595 


(1.233) 


MTY 


-1.6488 


(1.451) 


0.6806 


(0.578) 


FROSH 


0.5665 


(0 647) 


1.1978 


(2.213) 


SEX 


-0.0516 


(0.004) 


0 0297 


(0.001) 


SECON 


1 3364 


(2.777) 


0.5539 


(0 390) 


P/F 


-2.4501 




— D COCK) 




PR EM ED 


0.6124 


(4.011) 


0.4531 


(1.887) 


TEACHCOMP 


-0.0243 


(0.074) 


-0 5713 


(5 678) 


TEACHNICE 


-0.1121 


(0.732) 


-0 2103 


(0.832) 


TEACHGRADES 0.0772 


(0.038) 


-0.9199 


(4.076) 


IPS 


0.2581 


(1.735) 


— 0.4337 


f3 930) 


SPI 


1.0404 


(0.984) 


2 7833 


(12.654) 


CMI 


0.3307 


(1.239) 


00153 


(0.019) 




0.0054 


(0 040) 






A 


0.3154 


(1.154) 


4.4508 


(8 101) 


B- 


1.0814 


(1.842) 


— 2.2009 


(8.346) 


C 


0.0756 


(0.005) 


-1 3649 


(2.321) 


D, E 


-1 1280 


(0 037) 


-1.6168 


(5.346) 


VSAT 


-0 0196 


(1.471) 


0 0978 


(4.563) 


MSAT 


-0.0417 


(3.472) 


0.1099 


(7.270) 


SUMS 


-1.0275 


(4.310) 


2 8256 


(15.300) 


Constant 


31.069 




36.4282 




PC 


.3835 




.7100 




SEE 


2.92773 




1.9573 





siHve to TEACHGRADES (the higher the 
grade, the less the effort) and number of prob- 
lem sets (more problem sets translate into 
less effort): a class geared too high is ap- 
parently discouraging. The coefficient of 
TEACHCOMP, however, is large and sig- 
nificant; the bottom group works much 
harder when zn instructor u perceived as 
pedagogically competent. 

The second is the perverse sign on EFF in 
the enjoyment equation. For the top quarter, 
EFF and HRS are consistently insignificant; it 
may be that these students are sufficiently 
talented so that ov*r a low minimum, time 
spent on the course is consumption. 



B. Some Standard Problems: A Re-analysis 

1. Self-paced instruction. Evaluating an 
innovation is the most common occasion for 
an article in economic education. As noted 
in Section I, the heart of these evaluations is 
usually the estimation of a single-equation 
model: 

ACH = a + fe, (MSAT + VSAT) 

+ btSEX + b 3 YEAR + bM [1 = stu- 
dent had calculus) + b s l [1 = the stu- 
dent trying new pedagogy] + e 



TABLE 4.21 AcMmmtnt; "Fortst" Version; Top- v». Bottom-Quarter Students 

(figures in parentheses below coefficients are f-ratios) 

First Semester Second Semester 



Top Quar. Bot Quar. Top Quar. BotQuar. 



AMI 


.6335 


785 


8606 


9368 


(10.58) 


( 1 308) 


(3 73) 


(3 688) 


EFF 


- 0378 


0669 


1258 


.2259 


(.519) 


(.92) 


(1 619) 


(1.38) 


TEACH 


.3412 


.1613 


.1789 






(6.135) 


(3.187) 


(.081) 




SPI 


.0617 


.1165 


.0862 


0334 


(1.087) 


(3.927) 


( 444) 


(.069) 


ACH 1 






1.0879 


1.2586 






(44.864) 


(26.98) 


Constant 


1 1564 


1.0682 


-3 5508 


-4.733 




.1114 


.2168 


.3358 


.4773 


SEE 


.2352 


.2126 


.4441 


4075 



to yield a gross estimate of the effect of the 
innovation on achievement. 

The analysis of SPI already presented 
moved past the analysis of an innovation con- 
ventionally presented in several respects. Ex- 
plicit recognition of effort as an endogenous 
variable permitted unbiased estimates of the 
SPI effect. Introduction of the effort equa- 
tion allowed the identification of a "pure" 
self-paced effect, i.e., its effect net of in- 
creases in effort. Estimates for the top and 
bottom quartei* revealed something about 
the distribution of the gains. 

The Ec 10 model, however, permits the in- 
vestigation of how SPI improves perfor- 
mance to be carried one step further. More 
specifically if the model is re-estimated for 
SPI and for conventional format students 
separately and the coefficients of the effort, 
ability, and instructor variables are com- 
pared, it is possible to discover how SPI 
changes the educzfc'onal production process. 
Some of the results of doing so are presented 
in Tables 5.1, 5.2, and 5.3. 

Vis-A-vis the conventional group the most 
significant differences are as follows. (1) The 
coefficient of ATTEND (class attendance) is 
consistently significant for the conventional 
group, but larger and significant for the self- 
paced group. (2) The returns to mathematical 
preparation (SUMS) are only about half as 
large for the self-pac id group as for the con- 
ventional group. (3) Instructor variables 
TEACHGRADES and §PS are significant for 
the conventional group, but not for the self- 



paced group; they add nothing to the predic- 
tive power of the SPI achievement equation. 
(4) The traditional gender disadvantage 
disappears; the performance of self-paced 
women is not significantly different from 
that of men. 19 (5) In the effort equation, cur- 
rent performance is negatively related to ef- 
fort for students in conventional sections; it 
is insignificant for self-paced students. (6) 
Most important, the first-semester coeffi- 
cient of HRS (work out of class) for self- 
paced students is as large as the correspond- 
ing second-semester coefficient for conven- 
tional students, or triple the first-semester 
coefficient for conventional students. 

Taken as a whole these results suggest 
several hypotheses about how SPI works. Its 
major contribution appears to flow from its 
effect on student effort. It is somewhat im- 
perialistic in that it encourages students to 
spend time on the course regardless of their 
current level of performance. At the same 
time, by providing the student with a highly 
structured learning environment and well- 
defined study strategies, it raises the produc- 
tivity of a student's own time and effort. In 
the vocabulary of production theory, this is 
a labor-saving rather than a capital-saving 
technology. With this labor-enhancing ef- 
fect, self-paced students choose to spend 
more time on the course, which is reassur- 
ing, as it implies that achievement in 
economics is not an inferior good. 

At the same time, these results suggest the 
limits of instructional inputs. For students in 
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TABLE 4.22 


Achievement; 


"Tree" Version; Top* vs. Bottom-Quarter Students (Linear form) 










First Semester 






Second Semester 




Top Quar. 


Bottom Quar 


Top Quar. 


Bottom Quar. 


Coeff. 


f-Ratlos 


Coeff. 


r-Raiios 


Coeff. 


/•Ratios 


Coeff. 


f-Ratloe 


VSAT 


0238 


(1G 68) 


0195 


fS 809) 


0155 


(3 188) 


.0176 


U 327) 


MSAT 


.0219 


(1.667) 


.0084 


(.617) 3 


0362 


(2.24) 


0209 


(2.369) 


SUMS 


1.4833 


(8.894) 


1 081 


(3 286) 


.1297 


(031) 


-.5421 


(.490) 


SECON 


-.1027 


(.012) 


-.6479 


(326) 


- 0112 


(007) 


3695 


(.062) 


SHU MAN 


-3.3429 


(5.507) 


-1 691 


(1 332) 


-1 1163 


(.343) 


3229 


(.027) 


SEX 


0093 


(.000) 


- 1966 


( 036) 


- 1.3585 


(1 159) 


-2.984 


(4 192) 


MTY 


2 1129 


d 734) 


0769 


(005) 


1 1198 


(2 48) 


8065 


(.309) 


FROSH 


.8848 


(1 001) 


1.1367 


(1.265) 


-21320 


(033) 


-1 289 


(938) 


HRS' 


-.1280 


(.943) 


.0866 


(283) 


2525 


(2 203) 


4788 


(4 60P; 


ATT 


- 8199 


(5.725) 


-.8042 


(2 321) 


- 0361 


(.004) 


1836 


(.060) 


RPD 


-1 7319 


(4 067) 


-1 1935 


(1 553) 


5348 


(.253) 


.3871 


(.101) 


TEACHCOMP 


- 0336 


(.209) 


0756 


(.525) 


-.0849 


(447) 


.0943 


(.574) 


TEACHGRADES 


6479 


(1 997) 


- 0690 


(.013) 


3705 


(33.") 


-.1364 


(.060) 


iPS 


5353 


(5 157) 


.1097 


(145) 


- 1617 


(168) 


1652 


(169) 


P/F 


-2 4062 


(3.20 w 


- .6089 


(125) 


-.8760 


(.223) 


-1.798 


(4.609) 


SPI 


1.225 


(1 001) 


2.5809 


(3 607) 


1 2961 


(586) 


-1.080 


(.004) 


ACH* 










91^0 


(82.06) 


.8057 


(40.81) 


Constant 




14.336 


7 2966 


-47.44 




28.4461 


R* 




.3229 




.2684 




554 




.5370 


SEE 




4.7973 


4.7PG9 




5 7526 




5316 



■Most students in this group have scores between 630 and 650; hence, the variance of SAT is low. 
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TABLE 5.1 Rrst-SMntst«r Effort; 

S*lf-pac«d Studants Only 





WWII* 


/•Patios 


INTINV 


1.929 


(2.258) 


PERINT 


1.3342 


(.845) 


POLITINT 


-1.9145 


(2.979) 


rnUHtN In 


1 mi £ 

I . IU1 D 




ATHLETICS 


-.6772 


(.217) 


WRITING 


1.4954 


(.834) 


/no 




( 1Q^ 


fcA Inn 




\ OvjO) 


MTY 


-.0338 


(.02) 


cancu 
rtiUon 


— . / 10 1 


f) 


OCA 




\ ' .vu 1 J 


SECON 


1.5815 


(1.639) 


P/F 


-4.558* 


(.589) 


PREMEP 


.1137 


(.726) 


TEACHCOMP 


-.0024 


(.000) 


TEACHNICE 


.1121 


(.004) 


TEACHGRADES 


-.0710 


(.043) 


§PS 


.1146 


(.008) 








. eft 


.0028 


(.167) 




.7650 


(.233) 


' %- 


1.2123 


(.535) 


0 


2 2849 


( 1 297) 


D. E 


2.4378 


(1.253) 


ACH 


-1.225* 


(7.418) 



Constant 

ft 1 

SEE 



18.5154 
.1071 
3.5689 



MCH* is the regression of VSAT, MSAT, SUMS, and 
SMART on ACH. 

a self-paced section the instructor's general 
intelligence, preparati< \, and assignments 
are uncorrelated with achievement. To the 
extent that assignments, for example, resemble 
unsupervised SPI, these findings reinforce 
the suggestion that as a learning device, prob- 
lem solving is subject to diminishing mar- 
ginal returns. 

2. Explaining differences in male-female 
achievement. The question of why men 
consistently outperform women on standard- 
ized tests of economic achievement has been 
discussed at great length in the economic 
education literature. The range of explana- 
tions includes fathers who talk about 
business and finance to their sons but nut 
their daughters, inadequate mathematical 
preparation, and fears of success as defemin- 
izing. 20 Thus to ask that this model offer new 



TABLE 5.2 Achievement; First-Semester 
Serf-paced Students Only; 
"Forest" and "Tree" Versions 





Coeff. 


r-Ratlos 


"Forest" Version 






ABIL 


8168 


(29.114) 


tFF f 


.1522 


(1.677) 


INST 


0128 


(337) 


Constant 




.6282 


R 2 




3178 


SEE 




1433 


"Tree Version 






VSAT 


.0127 


(3 170) 


MSAT 


09R1 

\JC\J 1 




SUMS 


.4503 


(4.320) 


SECON 


-.0466 


(.022) 


SHI ltd AN 


— 0 ClAOO 


\ I 1 OU) 


SEX 


-.1128 


(.123) 


MTY 


.3405 


(.034) 


FROSH 


1.9550 


(2.646) 


HRS f 


.2527 


(2.459) 


ATI* 


-.8309 


(2.043) 


RPD 


-.1241 


(.011) 


TEACHCOMP 


- 0222 


(039) 


TEACHGRADES 


5561 


(.958) 


*PS 


- 4224 


(.261) 


P/F 


1.9649 


(.109) 


Constant 




5.3515 


R 1 




.4445 


SEE 




3.8584 



insights on the gender-related differential is a 
fair test of its usefulness. 

The results presented in Tables 2.1 and 2.2 
provided one insight into this differential, 
namely, that the gap appears to be cumula- 
tive. In November, men and women perform 
equally well, in January, men perform slightly 
better, and by June they perform subs^jhtially 
better. Moreover, the insignificant coeffi- 
cient on gender for both semesters of thr* ef- 
fort equation indicated this cumulative gap 
was not attributable to a reduction in effort 
by women students. 

When equations [1] and (2) are estimated 
separately for men and women, however, an 
even more interesting result emerges (Tables 
6.1-6.3 -pp. 190-93). The most striking 
difference between the male and female 
results is in the effort coefficients. The 
marginal product of HRS, an additional 
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TABLE 5.3 First Semester Enjoyment; 

SeIf*Paced Students Only 

(range 1-7: 1 = high) 





Coeff. 


f Ratios 


SATCOL 


-.6165 


(L\465) 


INTINV 


.2937 


(.298) 


HER INT 


-.7589 


(1.697) 


ATHINT 


.3451 


(.767) 


POLITINT 


.5018 


(1.243) 


PROF/ENTR 


8206 


(2 274) 


MTY 


- 8476 


(1 824) 


SEX 


6675 


(1 302) 


FROSH 


.3196 


(.346) 


SECON 


- 7967 


(2.139) 


TEACH EX 


-.6332 


(.686) 


TEACHCOMP 


- 0625 


(.504) 


TEACHNICE 


.1905 


(3.502) 


TFACHGRAOFS 


- 0536 


( 024) 


§P$ 


-.2968 


( 342) 


P/F 


.9903 


(-151) 


EFF 


0545 


(618) 


ACH 


-.0742 


(.981) 


SEMESTER GRD 


0618 


(.244) 


Constant 


( 


3 9578 


R 1 




1128 


SEE 


1 482 



hour spent by a woman on the course, is less 
than half the marginal product of an addi- 
tional hour spent by a man. Furthermore, 
during the second semester, when male "mar- 
ginal effort product" (i.e., the coefficient of 
HRS) increases sharply in both size and 
significance, female marginal effort product 
remains almost zero: there is no significant 
difference between the first- and second- 
semester coefficients for women. 

These results suggest the following ac- 
count of the source of differential perfor- 
mance. Women enter the course with less 
"skill" in learning analytic material— less 
practice in that peculiar intellectual exercise 
of model building. 31 Thus learning analytic 
material comes slowly; much or each hour 
spent studying is misdirected. Consequently 
at the end of the semester they have learned 
relatively little economics per unit of time 
and even less economic theory. Nor have 
they learned on the average to master 
analytic material as well as their male 
counterparts. 

Thus in the second semester women are, 



given the cumulative nature of economics, 
doubly disadvantaged. Since as the size of 
the coefficient of GR Ecl ° in Table 6.1 in- 
dicates, they are more sensitive to grades 
than their male counterparts, they do not 
reduce their effort in the second semester. 
But as indicated by the large negative coeffi- 
cient on gender in the second-semester en- 
joyment equation, they ultimately find the 
experience relatively unsatisfying. 

This account is supported by three addi- 
tional pieces of evidence. The first is a set of 
regressions, not reported here, ir which the 
dependent variable of equation [2] is the 
subscore on theory questions only. In these 
regressions the gender differential for the 
first semester is approximately 15 percent 
and significant at the 1 percent level, sug- 
gesting that the problem does lie in the 
analytic area. The second, noted earlier, is 
the insignificance of the gender differential in 
self-paced sections, where drill in analytic 
materials is most intense. The third is 
another series of subregressions in which 
equation [3] is estimated for the hard-core 
humanities {SHUMAN) concentrators (those 
who planned on a humanities concentration 
before entering). The pattern of first- and 
second-semester effort coefficients for this 
group is very similar to the women's pattern 
while the gender coefficient is insignificant. 

V. CONCLUSIONS 

Traditionally a conclusion serves several 
purposes. It summarizes the findings, 
sounds the major themes, reviews the prin- 
cipal caveats and suggests directions for 
future research. The summary and major 
themes have already been provided in the in- 
troduction; therefore these concluding re- 
marks deal with caveats and extensions. 

The first set of caveats concerns the depen- 
dent variable All of the analysis presented 
here deals witn the determinants of short-run 
mastery of economics. While it seems plausible 
that the determinants of short-run mastery 
and those of long-run retention are similar, it 
is a hypothesis that remains to be tested. If 
they are not, then this analysis is much less 
interesting. In addition, even as short-run 
analysis, the dependent variable measures 
one dimension of mastery. While the correla- 
tion between multiple-choice scores and 
scores on essay questions is reassuringly 



TABLE 6.1 First Semester Effort; Male and Female Students 





Male 




Female 




Coeff. 


(•Ratios 


uoeu. 


i'MalluS 


INTINV 


,0618 


(.028) 


1.9282 


(1.834) 


PERINT 


.7260 


(2.210) 


-.9880 


(.305) 


POUTINT 


-.6392 


(3.893) 


- 3192 


v-137) 


PROF/ENTR. 


-.8356 


(6.019) 


- .2264 


(.021) 


ATHLETICS 


.2984 


(.400) 


- 6904 


(.129) 


WRITING 


1.2414 


(4.702) 


-1.1373 


(.230) 


JOB 


— 1 . 1 




.9706 


( 225) 








- 4081 


(031) 


MTY 


.0924 


(.019) 






FROSH 






.7389 


( 215) 


SECON 


1.583 


(11.410) 


.1261 


(.005) 


P/F 


-3.4030 


(20.719) 


-5.3755 


(3.447) 


PREMED 


.1238 


(1 871) 


— 




TEACHCOMP 


- 0715 


(3.107) 


-.1566 


(1.260) 


TEACHNICE 


-.0014 


(.002) 


-.0214 


(.073) 


TEACHGRADES 


.00 lU 




1 APR 


(.045) 


iPS 


__ 1QQC 

— i oyo 




— .2630 


( 320) 


SPI 


1.3560 


(5.149) 


1 901 


(1.874) 


cm 


.1602 


(.172) 


- 


- 


A 


.3395 


(.479) 


.9259 


(173) 


B- 


0470 


(.006) 


. l4 lU 


\.\AJD) 


C-.CC+ 


1 2009 


(3.436) 


1.3574 


(.418) 


D, E 


— .3842 








ACH 1 


.5421 


(13.096) 


-1.0729 


(2.370) 


Constant 


20.4838 




17 4759 




R l 


.3196 




.197 




SEE 


2.7416 




4.1097 





high, there is no direct evidence for the rela- 
tionship between exam scores and the prob- 
lem-solving skills that some instructors have 
made the focus of an introductory course. 
Again, while it would be surprising if 
students who did badly on standard material 
did well at problem solving, it is a relation- 
ship that needs to be investigated." In some 
sense, these questions about the dependent 
variable are the special case of the larger prob- 
lem of measurement. Comparatively little is 
known about appropriate 'measures of ef- 
fort, analytic ability, and tastes. It is 
therefore difficult to be confident that one 
has dealt satisfactorily with the errors-in- 
variables problem. 

The principal questions and extensions 
suggested by this work are four. The first 
and most obvious is to test another group of 
introductory economics students. In par- 



ticular it be interesting to find a 

school where the upper quarter had MSAT 
scores around 650 and discover whether our 
"bottom-quarter findings" were a relative or 
absolute phenomenon. Second, as noted 
above, these results hold for a standard in- 
troductory course. It would be enlightening 
to compare these coefficients with those ob- 
tained by estimating the same model for an 
institutionally oriented or problem-based 
course. Third, because there has been nc 
substantial change in curriculum or course 
structure (apart from the SPI and CMI ex- 
periments), this study has provided no data 
on the productivity of "nonteacher" inputs. It 
would be informative to re-estimate this 
model in a situation where there was a 
greater variety of format (some large- 
lecture, some small-section), changes in 
texts, or changes in depth or breadth of the 



TABLE 6.2 Achievement; Male and Female Students; "Forest" and "Tree" Versions 



First Semester Second Semester 



Male Female Male Female 





Coeff. 


f-Ratlos 


Coeff. 


t-Ratios 


Coeff. 


f-Ratlos 


Coeff. 


f-Ratlos 


ForMf vara ion 3 


















ABIL 


.927 


(124.249) 


.7887 


(31.238) 


1 .8559 


(80.374) 


1 341 


(6.631) 


efP 


.0431 


(2.420) 


.0234 


(.102) 


.2712 


(14.012) 


- 2287 


(2.036) 


INST 


.3794 


(8.662) 


.1785 


(.573) 


0498 


(1.116) 


1.4912 


(2.062) 


SPf 


.0793 


(5.642) 


.0486 


(.930) 


.1202 


(1 857) 


.1 129 


(.400) 


Constant 




-.1259 




6492 




2.6526 




4.2859 


R* 




3053 




.3223 




.2778 




3142 


SEE 




.2123 




.1689 




.4721 




4783 


"Tree" Version 


















VSAT 


.0218 


(40 832) 


01601 


(5 8t*7) 


0173 


(13.101) 


0174 


(2.381) 


MSAT 


.0236 


(34.508) 


.0280 


(14.982) 


.0160 


(3.765) 


0179 


(1 yy<£) 




.8167 




.5992 


(1.048) 


.4153 


(3 765) 


- 1.1585 


(1.992) 


SHU MAN 


-3 2876 


(13.009) 


- 1 .0098 


(.830) 


-.6437 


(270) 


- 1.1345 


(.346) 


MTY 


.5697 


(.453) 


1.2113 


(.663) 










FROSH 


.5389 


(1.097) 


.4252 


(211) 


-.2715 


(.154) 


.2833 


(.028) 


HRSf 


.0309 


(1.0846) 


.0099 


(.005) 


.1810 


(12.436) 


-.0068 


(.001) 


ATT f 


-.5464 


(4.268) 


-.7560 


(3 907) 


-.0561 


(063) 


- 0916 


(1.53) 


RPD 


1 2224 


(4.873) 


.5083 


(.265) 


.8099 


(1 491) 


-.3218 


(.047) 


TEACHCOMP 


- 0137 


(.078) 


.0226 


(-071) 


- .0858 


(.615) 


0984 


(.288) 


TEACHGRADES 


.7988 


(7.511) 


.5434 


(988) 


.1779 


(.207) 


8389 


(.678) 


IPS 


.4401 


(8.939) 


.4393 


(2.188) 


- .2335 


(1.047) 


.1807 


(.119) 


P/F 


-1.3723 


(2 080) 


-2.1362 


(2.208) 


-1.927 


(.865) 


-3.7218 


(2/09) 


SPI 


1.8583 


(6.463) 


1.1394 


(1 646) 


4312 


(176) 


1.4828 


(.409) 


ACH 1 










/744 


(128 025) 


.8473 


(21.745) 


Constant 


-17 1705 




12.4916 


-23 785 


-38 395 


R 1 




.4461 




.4024 




5796 




.557 


SEE 




4.4942 




4.2198 




5.4662 




5.807 
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TABLE 6.3 Enjoyment; Male and Female Students 

(range, 1-7: 1 = high) 



Male 



Female 









t. Ratios 






U Ratios 


SATCOL 


-.0389 




(093) 


-.7564 




(4.080) 


INTINV 


-.2282 




(1.71) 


.0074 




(.001) 


PERINT 


— 0690 




\.1U4) 


oloo 




\ 1 UOJJ 


ATHINT 


— .0018 




(.004) 


1Q1 

— iyi 






POUTINT 


— .Uo/4 




(.s37s3) 


H707 
Kit Ci 




t was 


PROF/ENTR 


.3894 




(b too) 


— 4003 






MTY 


-.34161 




(1.384) 


- 4559 




(.329) 


FROSH 


.0921 




(.261) 


.1010 




/ noo\ 
{ \JOO) 


SECON 


- .3359 




(2 577) 


- 8776 




(2.953) 


TEACHEX 


2303 




(1 OOO) 


.4040 






TEACHCOMP 


0664 




(o.oiUj 


ncoe 

.UOob 






TEACHNiCE 


1306 




(11 617) 


0688 




(.407) 


TEACHGRADES 


- 0305 




^ 087) 


- 0001 




(.016) 


#PS 


- .0018 




(001) 


1 mo 

iuiy 




( ooyj 




0111 




(002) 


- 0695 




(.008) 


P/f 


- 1481 




(1 205) 


0764 




(119) 


EFF 


-.3078 




(1 736) 


.0123 




(.068) 


ACH 


- 0085 




(142) 


- 0361 




(033) 


SEMESTER GRD 


- 1247 




(7 253) 


- 1935 




(2.674) 


Constant 




2 5495 






5 6207 








3509 






1121 




SEE 




1.2375 






1 648 





syllabus. It would be useful to compare these 
results with those in a course where macro 
was taught first. Finally, as the R 2 suggests, 
there is still a considerable amount to be 
learned about the determinants of student ef- 
fort and enjoyment. 

LIST OF ABBREVIATIONS 

ABIL = student ability 

ACH = student mastery of economics (0-40) 

ACH 1 = achievement in first semester 

ACH f = fitted estimate of ACH 

ACINT = academic interests (1 = very 

important; 4 = not at all) 
AOA = all other activities 
AOC = all other courses 
ATHINT = athletic interests (1 = very 

important; 4 ~ not at all) 
ATT or ATTEND = attendance (1-7; 1 = 

always attends) 
ATP = fitted estimate of ATT 

CLEU = College Level Exam of Economic 
Understanding 



CM/ = case-method sections (1 = yes; 
0 = no) 

COLSAT = satisfaction with college life 

Dfr 10 , etc. = dummy variable for past grade 
in introductory economics 

e ■= orror term 
EC — economics 

EclO = introductory economics course 

EFt = student effort (0-35) 

EFF f = fitted estimate of Effusing exogenous 

variables only 
EN] = student satisfaction with the course 

FROSH = freshman year of college (1 = 
freshman; 0 = other) 

C — expected grade 
G P — previous grades in other courses 
G/?f f 2°, = previous grade in introductory 
economics 

H = happiness 

HRS = hours of study (0-20) 

Hi?S' = fitted estimate of hours of study 
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1NT1NV = Intellectual involvement (1 = 
not at all; 4 = very much) 

MS AT = score on mathematics portion of 
Scholastic Aptitude Test (200-800) 
MTY = race (1 = black, Chicano, Indian; 
0 = other) 

NOV GRADE = grade in November of first 
semester of Economics 10 

OLS = o r dinary least squares 

PED = innovation (1 if participating, 0 
otherwise) 

IPED] = educational inputs other than 
instructor 

PERINT = personal interests (1 = very 

important; 4 = not at all) 
P/F = pass-fail (1 = yes; 0 = no) 
POLITINT = political interests (1 = very 

important; 4 = not at all) 
PREMED = premedical student 
PROF/ENTR = entrepreneurial interests (1 = 
very important; 4 = not at 
all) 

§PS = quality and quantity of problem sets 
and handouts; (number of problem 
sets varies) 

RATESl = secondary school teachers' rat- 
ings of a student's scholastic 
promise 

RATESl = Harvard admissions ratings of a 

student's scholastic promise 
% READING = percent of reading assign- 
ments done 
RELTIME = time spent on Economics 10 

relative to other courses 
RPD = not in class to receive questionnaire 
(0 = no questionnaire; 1 = ques- 
tionnaire) 

SATCOL (same 

as COLS AT) = satisfaction with college life 



(1 = not at all; 4 = very 
much) 

SAT = intellectual satisfaction 

SECON = economics concentrator (1 = 

economics concentrator; 0 -= 

other) 

SEE = standard error of estimate 
SEX = 1 if female; 0 if male 
SHUMAN = plan to concentrate in human- 
ities 

SMART = 1 if MSAT at least 750 and SUMS 
= 3, 4 

SPI = self-paced section (1 = yes: 0 = no) 
SUMS = preparation in mathematics in high 
school (1-4) 

T = student time 

TEACH = instructor characteristics 
[TEACH] = vector of instructor character- 
istics 

TEACH COMP = intellectual competence 
of teacher (1 = high; 
7 = low) 

TEACHEX 

(same as TEX) = teaching experience (num- 
ber of years) 
TEACHGRADES = grades of instructor 
in college and graduate 
school (0-15) 
TEACHNICE = human qualities of instruc- 
tor (1 = high; 7 = low) 
TEX = teaching experience (number of years) 
TFSPEC = intellectual experience 
[Ts] = vector of student tastes 
TUCE = Test of Understanding in College 
Economics 

U ( = utility received by fth student 

VSAT = score on verbal portion of Scho- 
lastic Aptitude Test (200-800) 



FOOTNOTES 

1. With self-paced instruction (SPI) the student is given a list of objectives for each unit of the course. Whenever a 
student believes he has mastered a unit he comes in and takes an exam on that unit. The exam is graded pass/ fail. 
If he fails, there is no penalty, however, he must return to take a similar exam on the same unit. A student's grade 
depends only on the number of unite he ultimately passes, not on the number of attempts. The purpose of SPI is to 
provide the student with abundant, relatively nonthreatening feedback. 

2. For examples of this approach see any issue of the Journal of Economic Education or any empirical paper on 
economic education published in the May Proceedings issue of the American Economic Review. 

3. Another possible source of bias lies in the errors-in-variables problems created by the imperfection of the observed 
effort term or ability term. This problem is discussed at gre<-..er length in Section II. 

4. This is merely a translation from conventional production theory where a behavior function is introduced to take 
account of the maximizing process of the firm; i.e., of the possibility thai disturbance* are "transmitted." See Y. 



Mundlack and I. Hoch, "Consequences of Alternative Specifications in Estimation of Cobb-Douglas Production 
Functions," Econometrica, October 1966. 

5. One can, of course, treat institutional features in the manner of the Chicago school, i.e., as parameters in the max- 
imizing process. It seems more useful, however, to be less dogmatic about the pervasiveness of the maximizing 
process and allow institutional features to enter directly. 

6. This formulation assumes that first-semester grades within the course wash out the effects of previous grades in 
other courses. 

7. There is not a prion reason to assume that a linear specification of the effort or enjoyment equations is appropriate 
either. Examination of the residuals with a variety of functional forms, however, suggested that a linear specifica- 
tion was appropriate. 

8. There are other ways of relaxing this assumption, for example by the introduction of interaction terms or (as is 
done in Section IV of this paper) piecewise estimation. 

9. Within social psychology there is a literature on the satisfaction of the college student which could be used to 
refine this formulation . 

10. J. Kmenta, "On Estimation of the CES Production Function," International Economic Review, June 1967. 

11. It is encouraging that the constellation of tastes thus defined corresponds closely to those used in several major 
studies of college students' interests and priorities. 

12. In the Harvard system, students who wish to concentrate in economics are strongly urged to take the course in the 
Ireshman year; those who suspect they will not like the "economic way of thinking" typically put it off Thus year 
of enrollment is a good guide to interest. 

13. The imperfections of these measures of "analytic ability" are potentially the source of bias in the effort coefficients 
if this largely "unobservable" component of ability is correlated with effort. 

14. TEACHNICE was not included because it was clearly arrayed along another dimension. 

15. For a discussion of this and other approaches to the errors-in-variables problem, see Zvi Gnlrhes. "Errors in 
Variables and Other Unobservables," Econometrica, November 1974. 

16. CM/ is consistently insignificant and is omitted in other achievement equations. 

17. Some evidence for this hypothesis emerges when the data set is stratified by ability For the top quarter, effort and 
enjoyment are positively (although insignificantly) correlated, the correlation is negative and significant for the 
bottom quarter. 

18. The inappropriateness of this assumption is, for example, cited by Bowles major objection to the use of this 
function for analyzing data from primary and secondary schools (see San 1 Bowles. Toward an Educational 
Production Function," in Education. Income and Human Capital edited by W Lee Hansen [New York National 
Bureau of Economic Research, 1970)) 

19. This result holds even when the dependent variable is not the score on all questions but the score on analytic ques 
tions only. 

20. For a survey of the issues and evidence, see "Male-Female Differences in Economic Education A Survey/ John 
Siegr'ried, Journal of Economic Education. Spring 1976 

21. The question of why there might be a systematic difference in analytic power would then, of course, become the 
centra! question. 

22. This n.ay be one explanation for the insignificance of CM1. 
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THE EFFICIENCY OF EDUCATION IN ECONOMICS 

THE EFFICIENCY OF PROGRAMMED LEARNING IN TEACHING 
ECONOMICS: THE RESULTS OF A NATIONWIDE EXPERIMENT* 

By Richard E. Attiyeh, University of California, San Diego, G. L. Bach, Stanford University 
and Keith G. Lumsden, Stanford University 



Summary 

Recently considerable interest has de- 
veloped in the use of programmed learn- 
ing in the field of economics. While pre- 
vious experiments 1 have been on a small 
scale, the results suggest that programmed 
texts can be very effective teaching mate- 
rials. This paper reports the results of a 
larger, nationwide experiment, involving 
forty-eight schools and 4,121 students, de- 
signed to evaluate the efficiency of pro- 
grammed materials in teaching the core 
micro- and macroeconomics sections of the 
typical elementary economics course. The 
major results of the study are: 

1. On average, by spending twelve 
hours studying a programmed learning 
text students learned practically as much 
micro- or macroeconomics as did students 
in seven weeks of a conventionally taught 
elementary course. 

2. On the basis of the test question 
breakdowns, students who used only pro- 
grammed learning materials, as compared 
to conventionally taught students, per- 
formed better on "applications" of theory 
than on simple "concept recognition." 

3. Students had a generally positive at- 
titude toward programmed learning. 

•The authors arc grateful to the Joint Council on 
Economic Education for finaivial support, to Pren- 
tioHall, Inc., and McGraw-Hill, Inc., for providing 
programmed .naterials at cost, -and to participating 
colleges and universities for their unstinted coopera- 
tion. 

'See K. G. Lumsden, "The Effectiveness of Pro- 
grammed Learning in Elementary Economics," AMU., 
May, 1067, and "Technological Change, Efficiency 
and Programming in Economics," in New Develop- 
ments in the Teaching of Economics, &L K. G. Lums- 
den (Prentice-HaH, Inc., 1967). 



These findings suggest that the basic 
concepts and tools of micro- or macroeco- 
nomics can be self-taught in about two 
weeks' time with programmed learning 
materials, thereby freeing a much larger 
portion of the total course to help stu- 
dents develop skills in the application of 
the basic theory to "real world" problems. 

Experimental Design 

The principal objective of this experi- 
ment was to compare the performance of 
students using programmed learning, ei- 
ther by itself or as a supplement, with 
that of students taking a conventionally 
taught elementary course. Student perfor- 
mance was measured by test scores on an 
independently devised, nationally normed 
multiple choice examination taken when 
the micro- or macroeconomics topics were 
completed. To separate the effects of the 
programmed materials from other vari- 
ables, information was also obtained on 
the educational level, sex. and scholastic 
aptitude of students, the type, size, and 
quality of schools attended, and the text- 
book, class size, and experience of teacher 
for conventional sections. 

Each participating school established 
three test groups. Students in Group I 
were given copies of one of two pro- 
grammed texts and were told to study this 
text exclusively (avoiding other texts and 
not coming to class) for a period of time 
determined independently by each school. 2 

1 The programmed texts used were R. C. Bingham, 
Economic Concepts: A PrograMmtd Approach (Mc- 
Graw-HiD, Inc., 1966) which untains both micro 
and macro sections; or either K. G. Lumsden, R. E. 
Attiyeh, and G. L. BaJi, Microeconomics: A Pro- 
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On average, studcnb in this test group 
had thiee weeks to read the programmed 
book, of which they used, according to 
their responses on a questionnaire, only 
twelve hours. At the end of this period 
they were tested, and then rejoined or be- 
came a conventional class. 

In both Groups II and III, students 
were given conventional reading assign- 
ments and attended class lecture and dis- 
cussion sessions. Students in Group II, 
however, were also required to read a pro- 
grammed book, generally at a time and 
pace of their own choosing, while students 
in Group III were asked not to use the 
programmed book. In each school these 
two groups were tested at the same time, 
usually several weeks later than Group I. 
Each school had as much time as it 
wanted to cover the basic micro- cr macro- 
economic analysis. On average, sludents 
in these two groups had oeven weeks to 
prepare for the examination and students 
in Group II spent eight hours reading the 
programmed book. 

The tests used were preliminary forms 
(two micro and two macro) of the Test oj 
Understanding in College Economics 
(TUCE) prepared at the suggestion of 
the A.E.A. Committee on Economic Edu- 
cation by a special committee sponsored 
by the Joint Council on Economic Educa- 
tion. 3 To avoid possible contamination of 

grammed Book (Prentice-Hall, Inc , 1966) or R. E. 
Attiyeh, K. G. Lumsden, and G. L. Bach, Macro- 
economics: A Programmed Book (Prentice-Hall, Inc., 
1967). Approximately two-fifths of the students in 
Group I used the Bingham text; one-third of the stu- 
dents in Group II used the Bingham text. Others used 
one of the Attiyeh, Lumsden, and Barh texts All 
programmed learning and test materials were pro- 
vided free to participating schools. Each school was 
allowed to choose the programmed text preferred, 
subject only to the constraint that each book be used 
by a typical cross-section of schools. 

'For permission to use these tests we are grateful 
to the Psychological Corporation (304 East 45th 
Street, New York, New York, 10017) and to Professor 
Rendigs FeL chairman of the TUCE committee, 
whose paper at this *sion provides more information 



the results for Groups II and III from the 
earlier testing of Group I, the test form 
for these two groups was different from 
that given to Group I. 

The sample was drawn to provide ade- 
quate coverage f or five main types of col- 
lege., and universities — "high prestige" 
schools, liberal arts colleges, large univer- 
sities, state colleges (often formerly called 
teachers colleges), and junior colleges. 4 Of 
the 4,121 students on whom we were able 
to acquire a complete file of data, 84 per- 
cent were male. The average student was 
a sophomore and had the equivalent of an 
SAT composite score of 1108. The aver- 
age school had 10,620 students and an en- 
tering class with the equivalent of an SAT 
average composite score of 1122. The typ- 
ical student in Groups II and III was in a 
class of seventy-seven, and had an in- 
structor who had been teaching for 6.8 
years. Sixty percent of the students in 
these two groups used one of three widely 
known textbooks, referred to as textbooks 
A, B, and C. The remaining 40 percent 
used one of ten different books, none of 
which was treated separately because of 
the small sample of students using each. 

on the tests Other members were G. L. Bach, Wil- 
liam G. Bowen, Paul L. Dresscl (Executive Director), 
R. A. Gordon, Bernard F. Haley, Paul A Samuelson, 
John M. Stalnaker (Consultant), and George J Stig- 
ler. 

* Participating colleges and universities, some of 
which participated m both m.~ r o and macro portions 
of the study, were the following: "High-prestige" 
schools: Amherst, Dartmouth, Harvard, Haverford, 
Obcrlin, Swarthmore, and Yale, liberal arts colleges: 
Allegheny, Bowdoin, Denison, Lafayette, Union, and 
Wesleyan; large universities: Case- Western Reserve, 
City College of New York, Illinois, Indiana, Iowa 
State, Johns Hopkins, Minnesota, North Carolina, Ok- 
lahoma, Oklahoma State, Pennsylvania, Rochester, 
University of California, San Diego, Utah, and West 
Virginia, state (teachers) colleges: Arizona Western, 
California State, Fullerton, Georgia State, New Mex- 
ico State and San Diego State; junior colleges: Alle- 
gany, Bakersfield, Chabot, Cochise, Everest, Hill, Phoe- 
nix, Riverside City, Robert Morris and Triton, norm 
group: Butler, Geneva, Montana, State University of 
New York at Albany and Virginia Polytechnic Insti- 
tute. 
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Results 

The main findings of this study are re- 
ported in Table 2 which presents the sta- 
tistics for a regression involving all of the 
variables discussed above. The dependent 
variable — the number of correct answers 
on the TUCE— had a mean of 17.90 and 
a maximum value of 30. 5 

All of the variables representing stu- 
dent characteristics were statistically 
significant 8 and quantitatively important. 
Each year of educational attainment 
added .39 points to the predicted grade, 
while bi ing a girl cost .75 point: . Not un- 
expectedly, student ability was the most 
important single determinant of student 
performance. Each ICO points on the com- 
posite college entrance exam (SAT scale) 
was worth 1.03 correct answers on the 
TUCE exam. 

The results for school characteristics 
were somewhat more surprising. First, the 
average freshman entrance exam ^core was 
highly significant and nearly half as im- 
portant to an individual's performance as 
his own entrance exam score. Each 100 
points of the average SAT score of the 
freshman class addtd .45 correct answers 
tc the student's score, even after all other 
individual student qualities are taken into 
account. It is impossible to tell whether 
this gain was due to associating with 
bright students or attending a high quality 
school that attracts bright students. 

Second, school size had a positive, sta- 
tistically significant coefficient; each 
10,000 in enrollment added 0.2 points to 
the predicted score. Since school size in 
our sample ranged from 500 to 45,000 

'Actually, the TUCE consists of 33 questions. On 
the macro forms, however, we deleted from each 
form 6 questions dealing with microeconomics, leav- 
ing 27 questions. Two tests, therefore, had 33 ques- 
tions each and two ha& 27 each. Thus, for all four 
forms, the unweighted average maximum score was 
30. 

* A confidence level of 05 percent will bo assumed 
throughout the following discussion. 



students, going to the largest university 
ratner than the smallest school added ap- 
proximately .9 of a question to student 
performance. We have no satisfactory ex- 
planation. It may simply be that laiger 
schools have better courses and/or better 
instructors as measured by the TUCE. 
Or perhaps, students who attend larger 
schools put their talent to better use. 

Finally, of the school type variables, 
prestige schools and state colleges had 
latge and statistically significant coeffi- 
cients. Other things the same, compared 
to "other" schools, attendance at a "high 
prestige" school cost .78 points while be- 
ing at a state college added 1 30 points. 
Whether these coefficients reflect differ- 
ences in teaching effort or some other per- 
tinent factor is difficult to say. In evaluat- 
ing how a student would perform at 
schools of different types, however, it 
should be borne in mind that the quality 
(as measured by the average freshman 
SAT score) and size of these schools must 
also be taken into account. Leaving these 
factors free to vary, the influence on test 
score of attending a school of each type, 
as compared to "other" schools, is as fol- 
lows: prestige schools, +.57; large state 
universities, +.71; liberal arts colleges, 
+ 1.08; state colleges, +.8J; and junior 
colleges, -.90. These numbers still sug- 
gest that a student would perform better 
on the TUCE if he had gone to a less pres- 
tigious liberal art:, college, a state college 
or a large university than if he had gone 
to a prestige college or university. 7 

Variables 11-13 show the difference in 

* A similar clasisfi cation was used in the study by 
G. L. Bach and Phillip Saunders entitled "Lasting 
Effects of EcoikWcs Courses at Different Types of 
Institutions," AER. t June, 1966. In this study the 
significant finding was that the largest lasting effects 
for elementary economics courses were traceable to 
liberal arts colleges and the least lasting effects to 
large universities. Those results measure performance 
eight years after high school economics teachers took 
the elementary course. 
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TABLE 1 

Regression Results 
Dependent variable: test score (mean- 17.90) 
Coefficient of determination «= .45 
Standard error of estimate— 3.51 



Independent Variable 


Mean 


XV. C J£ l Cool \J i 1 

Coefficient 


it i*t 

Statistic 






.28 






1 ,94 


.39 


5.34 


2. Sex 


1.16 




-s!oi 


3. Entrance exam (SAT, hundreds) 


11.08 


1.03 


28.37 


t. rVVCiu.gc stUQOi cniranic exam ^orvi, uunurcusj 


1 1 


.45 


5.95 


5. School size (thousands) 


10 fO 


.02 


2 33 


U. riCSUgv sCUlAylS .... 


.16 


— .78 


— 2.40 




.40 


— .03 


— .11 


8. Liberal arts colleges ... 


.13 


.28 


!92 


9. State colleges 


.08 


1 30 


4.54 


iu. junior vuucgco. . . . 


.15 


.00 


.00 




.24 


.55 


3.13 


iz. xviicrocconoiiiics iwi u .... 


.27 


2.47 


14.45 


lO. *»» jicruc\.uiiuiiin»j> i coi ii . . . .... 


.27 


— .31 


— 1.94 


14. Conventional text A 


.27 


.86 


4.68 




.09 


.42 


1.73 


16. Conventional tex* C 


.10 


.75 


2.89 




4.67 


.01 


.26 


18. (Years of teaching experience) 1 . . 


68.96 


-.00 


-.49 


19. 1-i-tiass size . 


.02 


.30 


.06 


20. P. L. booh A-f-conventional 


.?6 


.55 


3.23 


21. P. L. book B-f-conventional 


.13 


-.37 


-1.72 


22. P. L. book A 


.19 


-.02 


-.07 


23. P. L. bookB 


.12 


-.46 


-2.02 



performance on the four test forms used. 
The standard of comparison is macroeco- 
nomics test B. Thus macroeconomics test 
A is .31 of a question more difficult. Since 
the microeconomics tests, however, each 
had 6 questions more and since students 
on average answered more than SO per- 
cent correct (i.e., 17.9 out of 30), we 
would expect, if the tests were homoge- 
nous, that the coefficient on each of the 
micro tests would be greater than 3.0 (i.e., 
the average student would be correct on 
just over 3 of the extra 6 questions). Thus 
microeconomics test A is considerably 
more difficult than macro B which was the 
easiest of all the tests. It should be re- 



called that preliminary forms of the tests 
wer* used. 

A surprising result is that teaching ex- 
perience and class size have no observed 
effect on student performance. 8 What text- 
book is used, however, does appear f o be a 
matter of some importance. Both texts A 
and C had positive significant coefficients. 
Other things equal, students who used 
these texts scored .86 or .75 points higher 
than students who used the average book 
in the "all other" category. There is no 

•It should be noted that the result hclds even 
though we tested for nonlinear relationships. We rec- 
ognize, however, that experience is not necessarily a 
good proxy for quality of teaching. 
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TABLE 2 

Programmed Learning Cokparxd to Conventional Instruction 



Alternative to Conventional 
Instruction 


Difference in Predicted 
Scop; (P.L.-average 
conventional) 


Statistic* 




-0.54 


-1.78 


P.L., book A and conventional 


0.55 


3.23 




-1.02 


-4.18 


PX., book B and conventional 


-0.37 


-1.72 



• Calculated from Table 2 and the underlying variance-co variance matrix of the re- 
gression coefficients. 



significant difference, however, between 
either books A or C and book B. 

In evaluating the effectiveness of pro- 
grammed learning taken by itself, the re- 
sults are somewhat difficult to read from 
Table 1. The coefficients shown in Table 
1, -.02 for P.L. book A and -.46 for PX. 
book B, reflect the difference between pro- 
grammed learning and a conventional 
course, assuming that variables 14-19 
have a value of zero. In the case of text- 
books, for example, this means that stu- 
dents in Group III are assumed to be 
using a text other than books A ? B, or C. 
To make a comparison with the typical 
conventional course, it is necessary to give 
appropriate weight to the coefficients for 
variables 14-19 in Table 1. This compari- 
son is shown in Table 2. What these 
figures indicate is that, other things the 
same, students using programmed learn- 
ing did half a question worse, in the case 
of PX. book A, and one question worse in 
the case of PX. book B, than did students 
in the average conventional course. In the 
case of PX. book A, however, this differ- 
ence is not statistically significant 

As for the effectiveness of programmed 
learning when it is used as a supplement, 
PX. book A added significantly, about 
half a question, to student performance, 
whereas PX. book B did not. These re- 
suits ave shown in both Tables 1 and 2. 

The questions on the TUCE were clas- 



sified into the following categories: recog- 
nition and understanding; simple applica- 
tion; and complex application. Questions 
in the first category required knowledge 
of relatively more historical and institu- 
tional material, while questions in the lat- 
ter two categories usually stated all the 
necessary factual material and tested the 
ability to use economic analysis. Separate 
regressions were run to explain student 
performance on each type of question. 
The statistics for these regressions are 
shown in Table 3 and their implications 
for the comparison between programmed 
learning and conventional instruction are 
shown in Table 4. The most striking re- 
sult is that, for PX. book A, most of the 
overall difference between student perfor- 
mance in Groups I and III was concen- 
trated in the recognition and understand- 
ing category. Since the programmed learn- 
ing texts were designed primarily to teach 
basic tools and not all basic introductory 
and institutional material covered by the 
typical course or the TUCE, the differ- 
ences in this ^egory are not surprising. 
For those questions requiring an ability to 
apply economic theory, however, students 
who used PX. book A only for 12 hours, 
on average, were an even match for con- 
ventionally taught students. 

In addition to the objective information 
already discussed we solicited studuit 
reactions to programmed learning. The re- 
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AMERICAN ECONOMIC ASSOCIATION 
TABLE 3 



Regression Results by Type op Question 





Recognition and 


Simple 


Complex 




Understanding 


Application 


Application 


XCQCpcUQCni VoXlaOlCS 


























xvcgrcoMuii 


* 


X} <pr»ee?nn 
XlCg ( voolU 11 


apt 








Coefficient 


Statistic 


Coefficient 


Statistic 



Coefficient 


Statistic 




1.35 




-.05 




.66 






.12 


3.91 


. 12 


3.67 


.19 


5.22 




-^34 


-5.44 


-.21 


-3^02 


-.27 


-3.50 


3. Entrance exam (SAT, hundreds) 


.28 


18.20 


.37 


22.12 


.38 


20.15 


ocnooi entr. exam. voAi, nunareas;. . . 


.12 


a no 


.08 


3.80 


.11 


4.52 




.01 


4.63 


.01 


2.29 


!oo 


1.34 




.04 


.62 


1.33 


17.15 


— .75 


—8.51 




-!o7 


-.96 


2^57 


34^41 


-!o7 


-!85 


8. Macroeconomics test A 


.83 


12.37 


-.19 


-2.64 


-.96 


-11.46 


9. Conventional text A 


.26 


3.47 


.14 


1.71 


.27 


2.92 




.28 


2.88 


.10 


.97 


-.26 


-2.18 


44 Pnm>»<tlnnn t t MT ♦ 


.31 


3.12 


-.05 


-.43 


.07 


.59 




.19 


2.65 


.14 


1.76 


.21 


2.41 




-.09 


-1.03 


-.14 


-1.54 


-.10 


-.94 




-.24 


-2.62 


-.05 


-.49 


.10 


.92 


IS. PX. bookB 


-.15 


-1.76 


-.24 


-2.53 


-.27 


-2.46 






.31 




.45 




.31 




S - 


1.47 


S « 


1.61 


S =* 


1.83 



sponses to the questions concerning effec- 
tiveness and interest are summarized in 
Table 5. Several aspects of these distribu- 
tions stand out. First, the mean response 
was favorable, but there was a wide dis- 
persion of opinions. Second, students con- 



sidered programmed learning to be some- 
what more effective than interesting. On 
aveiuge, the students gave the books 
grades of "good-minus" for effectiveness 
and "average-plus" for interest. Finally, 
although the objective evidence revealed a 



TABLE 4 



Proorajoced Learning Compared to Conventional Instruction, by Type op Question 
Difference in predicted 3core* (PX.—average conventional) 



Alternative to 
Conventional 
Instruction 


Recognition and 

Understanding 
up* 


Statistic 
Application 

ttft 


Complex 
Application 


Difference 


Statistic 


Difference 


Statistic 


Difference 


Statistic 




-.42 


-4.29 


-.11 


-1.0" 


.02 


.17 


PX. A and conventional. . . . 


.15 


2.65 


.14 


1.76 


.21 


2.41 




-.34 


-3.58 


-.32 


-3.08 


-.36 


-3.05 


PX. B and conventional. . . . 


-.09 


-1.03 


-.14 


-1.54 


-.10 


-1.13 



* Calculated from Table 5. The M r*' statistics are calculated using information from the variance-covariance matrix 
of the regression coefficients for the regression reported in Table 3. 



§£RJC 



1200! 

204 



EFFICIENCY OF EDUCATION IN ECONOMICS 



TABLE 5 

Student Attitudes Towakd Programmed Learning* 



As a way of learning economic theory what do you 
think of the programmed text? 





Group I 


Group II 




20% 


18% 


Good 


42 


43 




23 


25 


Poor 


12 


10 


Very poor 


3 


4 


How interesting was the programmed ttxt? 




Group I 


Group II 




6% 


3% 




41 


32 




34 


42 


Uninteresting 


12 


15 




6 


7 



• Figures may not add to 100 because of rounding or 
failure to receive answers. 



significant difference between the two 
books, this was not reflected in the stu- 
dent's opinions about them. 

Conclusion 

We feel that these results have impor- 
tant implications for the Organization and 
teaching of the introductory course. 
Within the profession many believe that 



the introductory course should prepare a 
student to **Jnk intelligently about major 
economic problems i modern society and 
that this goal can best be accomplished by 
teaching a few basic principles and ap- 
plying them to a number of important 
problems. We are in agreement with this 
view. This study has shown that by using 
programmed learning materials the basic 
micro- and macroeconomic theory can be 
taught in a relatively short period of time. 
Therefore, more time can be devoted to 
teaching students how to apply the theory 
to social problems, both by going more 
deeply into the more important problems 
and by actually covering those topics 
scheduled for the end of the term that 
often fall victim to the school calendar. 
The use of these materials can have other 
advantages: First, the student can gain a 
good overview of the entire course at the 
very beginnir which helps him to put 
topics covered in the remainder of th* 
course in meaningful perspective. Second, 
because a course taught in this manner 
emphasizes the usefulness of economic 
theory in a problem solving context, it 
promises a positive impact on the most 
important single factor in the learning 
process; namely, student attitude toward 
the subject. 
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[13] 

The Learning and Cost Effectiveness of AVT 
Supplemented Instruction: Specification of 
Learning Models 

William E. Becker and Michael K. Salemi 



The self-paced audiovisual tutorial (AVT) approach of instruction has been acclaimed by 
many educators as an efficient method of increasing student learning. 1 The rationale given for 
expecting increased learning includes the belief that: 

1 . Instructors will be able to extricate themselves from transmitting simple informa'ion 
and theory, thereby allowing themselves to spend time on higher levels of application; 
and 

2. Students will be able to allocate their time to those areas which they find difficult and 
away from those of less difficulty. 1 

The research described here is designed to examine whether the integration of an AVT program 
in the Community College economic principles course does affect the quantity, cost and 
efficiency of student learning. 

This research endeavors to answer four questions. First, can a learning model be specified , 
on the basis of formal theoretical and statistical grounds, within which learning can be 
examined in control-experimental groups? Unlike earlier studies, this study develops a theoret- 
ical and statistical model for equations estimated. 3 Second, is there any difference in the 
quantity of learning as measured by ;He Test of Understanding in College Economics (TUCE) 
between control groups and experimental group* using the AVT approach? This question is 
typical of those raised in control-experimental testing. 4 Third, what influence does student 
classroom and study time have on learning? In this study explicit account is taken of the fact that 
student time is an input to student learning. 5 Fourth, is learning produced in the control 
sections? Unlike most other economic education studies, this study utilizes human capital 
techniques to assign value to time. 

The sample data for this study were collected during the 1973-74 academic year at six 
urban and rural community colleges — three located in Minnesota and three in Missouri. At 
each college, the same instructor taught both a control section using his regular teaching plan 
and an experimental section in which student use of the David A. Martin AVT package, 
Introductory Economic Theory, was completely substituted for the instructor's presentation of 
several theoretical topics. 

William E. Becker is Associate Professor and Director of Economic Education, University of 
Minnesota, Minneapolis, and Michael K. Salemi Is Assistant Professor of Economics, University of North 
Carolina. They accept jointly responsibility for all aspects of the research. Darrell Lewis and Bill WaUtad 
provided constructive criticism of an earlier version of this paper. De Von Yoho deserves special recogni- 
tion for his efforts in arranging for Missouri Community Colleges which participated in this study. This 
research was supported by a grant from the Exxon Foundation and the Joint Council on Economic 
Education. 

^3 © URCEs Journal of Economic Education, vol. 8, no. 2, Spring 1977, pp. 77-92. Reprinted with permission of the 
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In the first phase of the study it is hypothesized that community college economic 
pnnciples course learning (measured as a student's post-TUCE score minus his pre-TUCE 
score) is linearly related to aptitude (as measured by the pre-TUCE score), a student time input 
(as reported in weekly questionnaires) and a set of dummy variables characterizing the students 
and their learning situation. 6 The hypothesis that there is no. difference in learning between 
control and experimental groups is accepted. Also it is shown that student study time has little 
effect on learning and that learning and pre-TUCE scores are negatively correlated. 

Other /esearchers have been confronted with a negative coefficient for pre-TUCE in 
similar "value added" learning models and have related this to "nonlinearity" and the TUCE 
"ceiling effect." 7 These researchers usually specify alternative ad hoc specifications. 8 In the 
second phase of this study , ' t is shewn that a formal modeling of a ceiling effect with pre-TUCE 
as a positive regressor in a "value-added model" can be accomplished, that such a modeling 
involves Simultaneous equations in an unobse^ed component, and that an implication of this 
approach is that ordinary least square regression (OLS) estimates are biased and inconsistent. 
An instrumental variable procedure is undertaken to estimate a nonlinear learning model and to 
retest the hypothesis that there was no difference in learning (change in TUCE) between control 
and experimental groups. 

In addition to examining the quantity of learning produced in control and experimental 
sections, the study attempts to determine whether the costs of learning are different in the 
control and experimental groups. An appropriate cost concept is developed and the cost of 
learning is examined for control and experimental sections, the null hypothesis that 'here is no 
difference in the average cost of learning per TUCE point between the two groups is accepted at 
the 0.05 Type I error level. 



Experimental Design 

This section briefly describes both the sample and course design used For a complete 
descnption of operational procedures used in data collection and a detailed course description, 
the reader is advised to obtain our 1973-1976 yearly program reports to the JOi t Council on 
Economic Education. 9 



Sample 

This stud> involves 330 students from six two-year community colleges, three in Min- 
nesota and three in Missouri. Although these schools differed in terms of geographic area, they 
were quite similar m terms of philosophy, objectives, resources, type of student body and type 
of introductory economics course. 

The Minnesota courses were quarter courses while the Missouri courses were semester 
courses. Both the quarter and semester courses, however, covered the same amount of 
matenals in roughly the same amount of time. The instructors of each school's introductory 
economics course were quite similar in terms of academic credentials, teaching experience and 
"willingness" to become involved in a study of the A VT mctnod of instruction. Each of the six 
instructors volunteered to teach bo,h a control and an experimental class at his respective 
school. 10 

All students were required to take a pre- and postcourse test of their understanding of 
college economics (TUCE). They completed a pre- and postcourse questionnaire identifying 
personal characteristics and attitudes. On the basis of the prequestionnaire data, Walstad [35] 
found that students in the control and expenmental classes of e&uh school were similar in terms 
of key characteristic measures (age, sex, aptitude, educational background, outside employ- 
ment, etc.). Students as well as instructors were also required to complete weekly question- 
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iwires reporting on time usage during the week. 

Each instructor carried out the administration and collection procedures at his institution. 
Since the questionnaire and test-scoring was done at the University of Minnesota and because 
specia I handling and mailing procedures were used, individual instructors did not have access 
co raw data. 

Experimental Course 

Before starting the program in the winter of 1974, the six instructors received background 
information about the program, attended a special workshop to learn about the A VT package to 
be used and became acquainted with the procedures of this study. Using the David Martin, 
Introductory Economic Theory, audiovisual-tutorial package, the six instructors, together with 
their school's media experts, set up study areas at each of the schools. Each school's study area 
contained several tape playback machines, slide projectors, viewing screens and study carrels. 
Several copies of the Martin material (slides, carrousels and audiotapes) were kept in the study 
area with each experimental student receiving his or her own copy of the Martin workbook. The 
study areas were centrally located on campus and had a wide range of hours during which they 
were open. 

Each experimental course was designed to be nine classroom sessions shorter than the 
control course and each instructor was free to schedule the omitted sessions . Content covered in 
units two, three, four and five of the Martin package was not explicitly covered in the 
experimental classrooms, so that experimental group students learned the material in these 
units at their own initiative and on their own time with the aid of the Martin A VT package. Each 
instructor could make use of the omitted content in classroom discussions of current issues but 
was asked not to give classroom assistance or explicit attention to teaching the content and 
concepts covered in the relevant Martin units! Students were permitted to see the instructors for 
individual help outside of the classroom. Except as provided above each instructor's control 
and experimental classes were taught in a similar way. 

Experimental Results with Linear Learning Models 

In general, this section and the following one focus on a relatively simple model of 
learning: 

(1) L =f[A,T,S,u] 

where L * Learning 

A » Aptitude 

T * Time input 

S « Situation 

u » Random error 

The model says that learning is to be explained as dependent, except for an error 
component, on a student's aptitude, the time the student spends in class and in study, and the 
situation or environment within which he teams. The ftinction/ may be thought of as a 
production function with A a measure of human capital, T a measure of labor time, and S 
characterizing the "physical learning plant." Variables such as age and sex are not included 
explicitly inasmuch as their contribution, if any, should be reflected in A. 11 Within the context 
of specific versions of this model the question is asked: "Did participation in the experimental 
sections (those with the Martin AVT component) make a difference in learning?" 

Estimation (Time Suppressed) 

The theoretical learning function given in (1) can take numerous algebraic forms; a linear 
model with time temporarily suppressed is: 
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6 1 6 1 

(2) D TUCE = 2 2 ais GROUP (i, ;) + 2 2 & GROUP (i, ;j • Apt + u 

/=l y=o /=i y=o 

where D TUCE = the difference between post-TUCE and pre-TUCE scores 

1= 1 if a student attended school / and participated in section;. 
(/ = 0 for control,; = 1 for experimental). 
= 0 otherwise 
Apt ■ pre-TUCE 

Learning is measured by D TUCE while pre-TUCE is used as a proxy for the students* 
unobserved aptitude to learn economics. A student's learning situation is described by the 
school which he attended and the section in which he was placed. Aptitude enters the equation 
in a way which would permit the discovery of interaction with the situational variables if any 
exists. Table 1 summarizes the OLS regression results: 

The null hypothesis that participation in the experimental A VT sections had no effect on 
learning is tested against the alternative hypothesis used in estimating (2). The appropriate 
constrained regression to estimate is: 

6 6 

(3) D TUCE = 2 a, School (/) + 2 ft School (/) ■ Apt 

/=1 /-l 

where School (/) = 1 if the student amended School (/) 
0 otherwise 

Apt s pre-TUCE 

Interaction is once again permitted between the aptitude proxy variable pre-TUCE and the 
situational variable School (/) (see Table 1). 

TheF-statistic for the test of the null against the alternative hypothesis was estimated to be 
1 .26 which is not significant at the .05 level. In fact, P { X ~ F ( 12,306) > 1 .26 } = .24. The 
hypothesis that participation in the experimental section had no differential effect on learning 
cannot be rejected. It is interesting to observe, however, that learning is significantly different 
across schools. In addition, a striking feature of the results is the ever-present negative 
coefficient on pre-TUCE which casts doubt on its role as a measure of aptitude in the system as 
written in (2). 



Estimation (Time as an Input) 

Since it is taken to be the labor input of a learning production function (1), the time a 
student devotes to the economics course should have an effect on learning. A linear learning 
model, given the hypothesis that time is an input to learning, is: 
6 1 

(4) D TUCE = 2 2 ao GROUP (/, ;) + fa GROUP (/. ;) • Apt 

/=i y=o 

+ y a GROUP (i\ ;) ■ TIME + u 

GROUP (i, j) and Apt are defined as before and TIME is defined as the course average of 
weekly responses to the questionnaire item "How much total time (classroom time, reading 
time, study time, etc.) did you devote to this economics course this past week?" 

In this case, the student's time input in the course is allowed to explain learning and 
interaction between time input and the situational variable is permitted. The appropriate 
constrained equation for a test of the null hypothesis that participation in the experimental 
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Tabfet 

Dapandant Variable » D TUCE 



Variable 



Coefficient 



Standard Error 



GP (0. 1) 
GP(1, 1) 
GP (0, 2) 
GP<1,2) 
GV (0, 3) 
GP(1,3) 
GP (0, 4) 
GP(1,4) 
GP (0, 5) 
GP (1, 5) 
GP (0, 6) 
GP(1,6) 
PREGP (0, 1) 
PREGP ( 1 , 1) 
PREGP (0, 2) 
PREGP (1,2) 
PREGP (0, 3) 
PREGP (1,3) 
PREGP (0, 4) 
PREGP (1,4) 
PREGP (0, 5) 
PREGP (1,5) 
PREGP (0, 6) 
PREGP (1,6) 



Unconstrained Regression (2) 



5/M 


2.83 


7.1« 


3.06 


7.68 


2.74 


9.62 


3.11 


7.87 


2.49 


5.14 


2.54 


11.05 


2.91 


7.50 


3.15 


12.07 


2.48 


10.52 


2.63 


10.29 


4.00 


14.55 


3.16 


-.18 


.21 


-.24 


.22 


-.18 


.20 


-.42 


.26 


-.43 


.19 


-.29 


18 


-.77 


.24 


-.25 


27 


-.81 


.21 


-.60 


.22 


.11 


.34 


-.49 


.27 



#2 = 345 
F (24,306) = 19.85 
Residual sum of squares 
N = 330 



= 5296 



Sch 1 
Sen 2 
Sch 3 
Sch 4 
Sch 5 
Sch 6 
PRESCH 1 
PRESCH2 
PRESCH 3 
PRESCH 4 
PRESCH 5 
PRESCH 6 



6.26 


2.07 


7.91 


1.98 


6.65 


1.78 


9.80 


2.14 


11.32 


1.81 


12.94 


2.49 


-.20 


.15 


-.23 


.16 


-.38 


.13 


-.55 


.18 


-.7! 


.15 


-.25 


.21 



R* = .313 
F (12,318) = 38.07 
Residual sum of squares 
N » 330 



= 5558 



uc 



poej 
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section had no differential effect on learning is: 
6 

(5) D TUCE =2 oti School (/) + A School (i) • Apt + y { School < • Time + m 
i = 1 



The OLS regression results arc summarized in Table 2. 



Table) 2 

Dejpemdtnt Variable; * D TUCE 



Variable Coefficient Standard Error 



Unconstrained Regression (4) 



GP (0, 1) 


8.65 


3.67 


GP(1, 1) 


3.07 


3.55 


GP(0, 2) 


5.61 


3.14 


GP(1 # 2) 


12.88 


4.25 


GP (0, 3) 


6.44 


3.65 


GP(1,3) 


5.02 


2.71 


GP(0.4) 


8.42 


5.11 


GP(!,4) 


2.45 


4.78 


GP (0, 5) 


16.48 


3.76 


GP(1,5) 


7.77 


3.97 


GP (0, 6) 


12.44 


8.10 


GP(1.6) 


8.84 


4.86 


PREGP (0, !) 


-.20 


.21 


PREGP (1, 1) 


-.36 


.22 


PREGP (0, 2) 


-.19 


.20 


PREGP (1, 2) 


-.49 


.27 


PREGP (0, 3) 


-.43 


.19 


PREGP (1.3) 


-.29 


.18 


PREGP (0, 4) 


-.73 


.25 


PREGP (1,4) 


-.31 


.27 


PREGP (0, 5) 


-.90 


.21 


PREGP (1,5) 


-.57 


.22 


PREGP (0, 6) 


.19 


.41 


PREGP (1, 6) 


-.49 


.26 


TIMEGP (0, 1) 


-.42 


.34 


TIMEGP (1, 1) 


1.00 


.45 


TIMEGP (0, 2) 


.34 


.26 


TIMEGP (1,2) 


-.38 


.34 


TIMEGP (0, 3) 


.24 


.46 


TIMEGP (1,3) 


.03 


.22 


TIMEGP (0, 4) 


.42 


.67 


TIMEGP (1,4) 


1.14 


.82 


TIMEGP (0, 5) 


-.65 


.42 


TIMEGP (1,5) 


.42 


.46 


TIMEGP (0, 6) 


-.48 


1.59 


TIMEGP (1, 6) 


1.07 


.70 



R % « .38 

F (36,294) = 13.98 

Residual sum of squares « 4994 



Tabte 2 (Continued) 
D#p*nd#nt Var lab* ■ D TUCE 



VariaM* 



Coefficient 



Standard Error 



Sch 1 
Sch2 
Sch 3 
Sch 4 
Sch 5 
Sch 6 
PRESCH I 
PRESCH2 
PRESCH 3 
PRESCH 4 
PRESCH 5 
PRESCH 6 
TIMESCH 1 
TIMESCH 2 
TIMESCH 3 
TIMESCH 4 
TIMESCH 5 
TIMESCH 6 



5.98 


^ CO 

2.58 


7.24 


^ AC 

2.46 


6.26 


2.06 


5.64 


3.50 


11.92 


2.73 


6.72 


3.86 


-.20 


1 c 

. 1 J 


-.23 


.16 


-.38 


.13 


-.53 


.18 


-.71 


.15 


-.33 


.21 


.05 


.26 


.10 


.21 


.08 


.20 


.76 


.51 


-.09 


.31 


1.25 


.59 



R 1 = .33 

F (18,312) -25.87 

Residual sum of squares = 5434 



The F-statistic for the test of the null hypothesis is estimated to be 1 .44 with P { X ~ F 
(18,294) ^ 1.44 } = .11. The null hypothesis that there is no differential effect cannot be 
rejected at the .05 level. In this case, the time :nput as measured in this study does not make a 
significant contribution to the explanation of learning. The F-statistic which tests the null 
hypothesis represented by equation (3) against the alternative (4) is estimated to be 1 .39 with 
P {X~F (24,294) 1.39} = .11. 

It is important to note that the above-given results do not mean that there was no 
statistically significant difference in learning between control and experimental sections at any 
of the six schools. Rather, the results say that taken together there was no difference in learning 
between control and experimental sections. 

Nonlinear Learning Models and Correction for Simultaneous Equations Bias 

In this section the implications of fitting a learning model like (1) are carefully studied. In 
particular, the learning equation is viewed as the reduced form of a simultaneous equations 
system in which an explicit relation is posited between aptitude and pre-TUCE. The fact that 
there is a maximum pre-TUCE score and a maximum to TUCE measurable learning is 
incorporated in the model. 

Pre-TUCE cs fin Aptitude Proxy 

The learning function in (1) related learning output (D TUCE) to the inputs aptitude, /*, 
situation, S, labor time, T t and an error term, w. 
(1) DTUCE=/04, S.T. u) 

In the previous section it was simply assumed that pre-TUCE was some type of proxy for 
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aptitude which is unobserved. At this point, the relationship between pre-TUCE and aptitude is 
assumed explicitly to be: 

(6) pre-TUCE = h (Aptitude, v) 
where v is a random error. 

Solving (6) for aptitude and substituting in (1) gives the reduced form 

(7) D rUCE = k (pre-TUCE, S, Z w) 

The reduced form error w is made up of the shocks u and v which implies , in general , that w will 
be correlated with pre-TUCE. 

Assuming explicit linear equation forms, such as those used in the previous section, the 
two-equation system would be given by (8) and (9) with reduced form (10): 



(8) D TUCE -= a + b Aptitude + c' Situation + d Time + « 

(9) pre-TUCE « y + 0 Aptitude + v 

(10) D TUCE = a' + V pre-TUCE + c' Situation + d Time + w 
where a' = (a - XblO) 

V « b/0,0>0,b>0 

c' = a vector of coefficients conformable to the situation vector 



w = u - (bid) v 

A necessary condition for OLS to give consistent estimates of the parameters in ( 1 0) is that 
regressors are uncorrected with w. However, 

b b 

E (pre-TUCE ■ w) » E [(\ + 6 Aptitude + b) (« - yv)] « - -jE [v 2 ] < 0 

Therefore, OLS estimates of V are biased and inconsistent. 12 It is this source of simultaneous 
equation bias which may account for the highly significant negative pre-TUCE coefficient 
estimates reported in the previous section. 13 

An appropriate remedy fo r the problem of simultaneous equation bias is the use of an 
instrumental variable procedure such as two stage least squares (TSLS). The results of 
reestimating a basic linear learning model via TSLS are given in Table 3. 



Tables 

Two Stage Least Squares Estimates of Linear Learning Model 
Dependent Variable = D TUCE 



Variable Pre-TUCE School 1 School 2 


School 3 


School 5 


School 6 


Conex Time 


Coefficient .22 -.17 2.45 
Standard error .23 3.06 2.99 


-1.55 
3.08 


.40 
2.65 


6.69 
2.75 


-.56 .17 
.63 .45 


R 2 = .11 N = 219 Residual sum of squares 


= 4421 









It worth noting that the TSLS pre-TUCE coefficient estimate is positive (although not 
significantly so) which is consistent with the assumption that pre-TUCE is a proxy for aptitude. 
Once again, however, there is no discernible difference between the learning in the control and 
experimental groups since the coefficient of the Control-Experimental Dummy Variable 
(Conex) is not significantly different from zero. The coefficient of student time is also estimated 
to be insignificantly different from zero. 

The Gap-Closing Model 

Researchers in models of economic learning have puzzled over the negative slope 
coefficient estimate for pre-TUCE in a regression of post-TUCE - pre-TUCE and other 
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variables. Some have suggested that the negative coefficient is evidence of a ceiling effect 
created by the fact that there is a maximum possible score on any testing instrument. But we 
have seen that the negative coefficient may be spuriously due to simultaneous equation bias. 
What follows is an attempt to model such an effect explicitly. 14 

It seems reasonable to assume that the first derivative of (6) with respect to aptitude is 
positive while the second is negative for pre-TUCE scores ranging over the set [0, 33]. An 
explicit cquational form which has such characteristics is the logistic (11). 



(11) 



pre-TUCE - 33 



1 



•4 



where A = aptitude 

v = random error 
e = 2.71828 

In (1 1) withS > 0,A -* * gives pre-TUCE-* 33 at a decreasing rate over the relevant domain. 

There is also a maximum to learning as defined in model (1); with a zero pre-TUCE one 
can at most gain 33 points with a perfect test score. Equation (12) reflects the fact that even with 
a very high aptitude and a very long effort a student will at Dest score well enough on the posttest 
to close the gap between a perfect score and his pretest score. 



(12) D TUCE = (33 - pre-TUCE) 




where r 1 if the student belongs to school / and participated in section y 
0 ~ 0 otherwise 

Sy = the multiplicative effect on learning of belonging to group (/, j)\ S u > 0 

With r greater than zero, as/4 ~>0, D TUCE ->0 and as /I D TUCE -» TUCE Gap 
times a constant which permits the effect of aptitude on D TUCE to be different in different 
groups for a given time commitment. Similarly , with z greater than zero asT-*0,D TUCE -» 0 
and as T -+ 00 , D TUCE -» TUCE Gap times a constant which permits the effect of time on D 
TUCE to be different in different groups for given aptitude levels. As either A or T rise, D 
TUCE rises at a decreasing rate over the relevant domain. 

Solving (11) for aptitude and substituting into (12) yields 



(13) DTUCE= (33 -pre-TUCE) 



[ pre-TUCE " p F 1 1 
33f J L^tJ 



TTy (S u fij 



Dividing both sides of (13) by (33 - pre-TUCE) and rearranging factors gives the familiar 
"Gap-Closing measure as the dependent variable. 15 



(14) 



D TUCE 
33 - pre-TUCE 



pre-TUCE 



33 



1 + ■ 



Try (Sufij 



In (14), it can be seen that the measure of learning, the fraction of the TUCE Gap closed, is 
positively related to pre-TUCE if rls > 0. 
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The reduced form (14) can be estimated by performing TSLS on its base* logarithmic 
form: r -i r- -i 

<*> ^33^Sce-t L- j2 S s -ht*- L,(r+,) J + 

I Ln(S u lFij + (« - rv/s) 
u 

The test that there was no difference in learning between control and experimental groups is the 
test thatS<x - = 1, . . . 6. 

The Gap-Closing model is misspecified in at least one important way, however. As 
written, (15) implies that random shocks scale up or down the effects of the explanatory 
variables and, thus , the model predicts that D TUCE will be signed identically for all students in 
group (i , ;). If S u > 0, the mode,* predicts D TUCE > 0. But this is a serious inadequacy since a 
student may guess the answer to questions he or she does not know. It is possible that a student 
who learned little and guessed a lot might score higher on the pre-TUCE than on the 
po$t-TUCE. It is also possible but was not the case that such a guessing phenomenon may be 
more likely at lower aptitude levels. 16 

The scheme used to estimate (15) is simply to drop from the sample those cases for which 
D TUCE < 0. As suggested above, this is done on the grounds that these were the cases for 
which it is most likely that guessing dominated learning in accounting for the TUCE scores 
observed. The resulting sample contained 178 cases from schools one, two, thre*, five, and six. 
The OLS and TSLS regression results are summarized in Table 4. 

It is immediately clear that the fit is not very good but that the coefficient of pre-TUCE is 
significantly positive when estimated by TSLS while it is not when estimated by OLS. 17 Once 
again, the coefficient of the time variable is estimated to be insignificantly different from zero. 
It must be remembered that in the model (15) the estimates of the coefficients of the group 
dummy variables are estimates of logarithmic values. 

A test can be formulated of the hypothesis that Ln (S 0 1) for all / = 1 , 2, 3, 5, 6. 

Let $ 52 the two-stage least squares estimates of the unconstrained model coefficients 
(Table 5) 

/3 0 s the coefficient estimates of the constrained model 
1 22 ^ e i nvcrsc °f *e estimated variance covariance matrix of # from the two 
stage unconstrained procedure. 

Then 0 - /3o)' S^" 1 0 - /J 0 ) is distributed as a Chi square with ten degrees of 

freedom under the null hypothesis. 
The x 2 statistic was estimated to be 3.07 well within the 90-percent acceptance region. 
Therefore the null hypothesis that there is no difference between control and experimental 
groups cannot be rejected. 

There are at least two kinds of conclusions which can be reached on the b* sis of the 
reported research. If the view is taken that the aptitude which enters the learning production 
function is general aptitude, then the problem of simultaneous equations bias discussed in the 
first part of the previous section can be solved by using other measures of aptitude than 
pre-TUCE in the learning equation. No instrumental variables procedure would be needed. If 
the view is taken, however* that the aptitude which enters the learning production is "aptitude 
in economics/* then the procedures of the previous section would be indicated. In either case, 
future research in economic learning should strive to collect data which will give information 
on the aptitude of students in economics to use either as a replacement for pre-TUCE or as an 
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Table 4 

DTUCE 

Dtptndtnt Variable = Ln 33 , p^TUCE 



OLS Unconstrained Regression 
Variable Coefficient Standard Error 



TSLS Unconstrained Regression 
Variable Coefficient Standard Error 



Ln [T/(l + T)] 


.48 


.73 


Ln 1 1/(1 + i)j 


1 1 


01 

. i 1 


Ln (pre/33; 


.01 


.19 


Ln (pre/33) 


1.78 


.75 


GP (0, 1) 


-1.75 


.27 


GP(0, I) 


.01 


.79 


Or (1,1) 


1 . jj 


11 




09 


.74 


GP (0, 2) 


-1.21 


29 


GP (0, 2) 


.45 


76 


GP(1,2) 


-1.32 


.28 


GP(1, 2) 


.49 


.81 


GP (0, 3) 


-1.63 


.30 


GP (0, 3) 


.28 


85 


GP(1,3) 


-1.83 


.28 


GP(1, 3) 


-.09 


.78 


GP (0, 5) 


-1.54 


.32 


GP (0, 5) 


.54 


.92 


GP(1,5) 


-1 50 


.30 


GPU, 5) 


.54 


.90 


GP (0, 6) 


- .84 


.34 


GP (0, 6) 


1,17 


.91 


GP(1,6) 


- .94 


.31 


GP(1, 6) 


1.04 


.89 


R* = .14 






R 2 = -.31 






N = 178 






N = 178 







OLS Constrained Regression 
Variable Coefficient Standard Error 



TSLS Unconstrained Regression 
Variable Coefficient Standard Error 



Ln [T/(l + T)J 


.44 


.71 


Ln[T/(l + T)] 


.17 


.86 


Ln (prc/33) 


01 


.19 


Ln (pre/33) 


1.61 


67 


School 1 


-1.66 


24 


School 1 


-.12 


68 


School 2 


-1 27 


.25 


School 2 


30 


.70 


School 3 


-1.73 


.26 


School 3 


-.09 


.72 


School 5 


-1.52 


.28 


School 5 


.34 


81 


School 6 


- 90 


.28 


School 6 


.90 


.79 


R t = .13 






R 2 = -.24 






N = 178 






N = 178 







instrument for it. 

Opportunity Cost of Learning 

The work of the second and third sections of this article suggests that there is little 
difference between the control and experimental groups in terms of economic learning. 
Furthermore, the student study time required to produce this learning appeared to be similar for 
both control and experimental groups. If student time was of equal value, one could conclude 
that the added fixed cost of the Martin material ($350/package, $150 for duplication, $3.50/ 
student workbook) is not justified. One caveat to this conclusion is worthy of mention, 
however. 

From the student wage data which was collected on a pre- and postcourse basis, it i? clear 
that not all students faced the same opportunity cost of time. Hourly wages reportedly received 
by students ranged from zero for unemployed students to over $9 for students holding part and 
full-time jobs. Assunvng that reported average student wages over the time span of tne course 
reflect the value of time for individual students, then the weekly cost of student study time is 
simply the product of this wage and their reported average weekly study time. Average 
aggregate weekly cost figures categorized by change in TUCE point* which may be thought of 
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as average cost of learning data, are given in Table 5 . The sample size of 1 07 for the control and 
106 for the experimental group reflects the number of students who were working during the 
course and reported wage data. 



TftbltS 

Time and Opportunity Cost 
of Wag*Earnlng Students 



Average Cost per Week 
D TUCE Control Experimental 



— 8 




$ 7.500 


— 7 






- ft 


o 


o 


— 5 


19.816 


9.150 


— 4 


o 


o 


— i 


^ 014 


16.234 


— 2 




12 fAl 


— i 


15.769 


13.013 


n 


J.\J .Oil 


17 d?1 


1 


17.785 


13.132 


2 


17.198 


13.407 


3 


20.162 


13.422 


4 


16.056 


14.441 


5 


14.819 


13.152 


6 


16.610 


22.835 


7 


15.090 


13.845 


8 


24.014 


15.520 


9 


17.234 


10.859 


10 


17.186 


11.339 


11 


28.510 


15.249 


12 


14.394 


24.848 


13 


22.144 


18.620 


14 


0 


0 


15 


0 


6.020 


16 


0 


0 


17 


22.250 


0 


18 


23.520 


8.250 



N = 107 N = 106 



Casual comparison suggests that the total weekly student cost of learning was less for the 
experimental group than for the control group, $1 ,575.95 per week versus $1,969.75 per week. 
This difference may be viewed as sufficient to offset all additional fixed costs of the Martin 
package; over a ten-week period this would imply nearly a savings of $4,000. 

However, the average weekly student cost of learning per TUCE point, wage x study 
time/D TUCE, was not found to be statistically different in simple Mest comparisons; f = 0 .8 1 1 
for the two groups at the .^5 Type I error level. 

(16) ' = 1.5613 + 0.2155 Conex 

86 (s.e. = .1914) (s.e. = 2.656) 
where Conex = 1 for control and 0 for experimental 

The apparent total cost benefit of being in the experimental group may simply be an artifact. 
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Summary 

This research addressed four questions: First, can a learning model be specified within 
which learning can be justifiably examined in control-experimental groups? Second, is there 
any difference in die quantity of economics learned between community college control groups 
and experimental groups using the David Martin audiovisual-tutorial approach? Third, what 
influence does student classroom and study time have on learning? Fourth, is learning produced 
in the experimental stene: * less costly to students than learning produced in the control 
sections? 

The relatively standard learning model in which pre- to postchanges in TUCE scores are 
regressed on aptitude and other control variables, via OLS, was shown to result in biased and 
inconsistent estimates when the pre-TUCE is wed as a proxy for aptitude. Two stage least 
squares estimates of the coefficient on pre-TUCE were shown to be significantly positive. In 
addiUon, the ceiling effect, known to exist in work with the TUCE, was formally modeled 
through nonlinear equation specifications for the relationship between aptitude and pre-TUCE 
and the relatonship between changes in TUCE and aptitude. From this equational system the 
"Gap-Closing Model" was derived as a reduced form equation. 

In both the linear and nonlinear model specifications, the hypothesis that increases in 
economic learning result from the use of the David Martin audiovisual-tutorial package was not 
accepted. Student classroom and study time also did not show itself to be a significant input in 
the learning of economics. When properly estimated by TSLS the pre-TUCE effect on student 
learning was positive and significant. No statistical significant difference was found between 
the control and experimental groups average weekly student cost of learning per TUCE point . 

In short, little difference was found in this study to justify the added cost of the David 
Martin package. This study does, however, provide a sound statistical modeling procedure 
which previously has not been attempted in economic education. 

Footnotes 

1 See, for example [26, 10, 16, 19,23,22, 1 7, 7, 40 and 34]. Waistad [35] provides a review of most of this 
literature. 
*See [33 and 24]. 

*The importance of model specification in hypothesis-testing has been recognized by statisticians for some 
time. Educators, such as Dubin and Taveggia [11] and Brown [5] have indicated the need for, and 
consequence of not, developing formal models describing the teaching-aptitude-learning linkages. Yet, 
economic education evaluators still carry out simple /-test comparisons, assume ad hoc linear regression 
models or include variables as regressors on the basis of a questionable stepwise procedure, see Bishop's 
1976 Delta Pi Epsilon Research Award winning study [4] as a typical example. Recently, a handful of 
economic education researchers have come to acknowledge the importance of appropriate modeling; see 
the Soper-Saunders [30 and 27] interchange and the Soper-Becker-Soper-Highsmith interchange [31,3, 
32, 18]. 

♦Thrprobleras involved in using the TUCE in community colleges are well documented [9, 36, 21, 37]. 

Also, see [8] regarding the questionability of using the TUCE in pre-post testing. 
f Early evaluation by Paden and Moyer [25] , Saunders [28] , and Lewis and Dahl [20] , however, did make 

use of the average number of hours a student spent studying economics per week. Few other researchers 

have addressed the impact of time on learning. 

•The Minnesota Scholastic Aptitude Test Scores (MS AT) or a comparable alternative were available for 
only 219 students while pre-TUCE scores were available for 330 students. Therefore, using the 
pre-TUCE as a proxy for aptitude provides more than a 50 percent sample size advantage. In addition , 
pre-TUCE is a specific measure of economic aptitude while MSAT is a general measure of aptitude. 
Several authors, however, have been successful in using general aptitude as a determinant of specific 
economics in a gap-closing model; see, for example [1]. 

The ceiling effect on the TUCE has been noted by many researchers; see, for example [12, 21]. 
•if satisfactory measures of aptitude are available, researchers will then specify that "value added"— 
post- minus pretest— is a direct functionof the aptitude measure; see, for example [25] . More commonly , 
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however, an "absolute level model**— posttest is dependent variable — or the "Gap-Closing Model" — 
change in test scores divided by max-change— is arbitrarily specified with the pretest score and/or 
aptitude used as a regressor; see, for example [3i, 36, 38]. The OLS estimation bias caused by serial 
correlation inherent in the absolute level model where pre-TUCE is used as a regressor is addressed in 
I3J. 

•Walstad [35] provides a summary of these yearly reports and gives initial data comparisons. 
l0 The Minnesota community college instructors had participated in earlier experiments, such as those 
described in [21 and 38]. 

1 "The fact that an individual's characteristics are reflected in his aptitude is recognized by psychologists and 
educators. For example, Brown [6, p. 314] writes: "An individual's performance on a given task is not 
determined solely by situation forces but is also a function of the characteristics of the individual— his 
aptitudes" (emphasis ours). T>is is a controversial issue, however. Therefore, future researchers may 
want to include explicitly individual characteristics in their learning model specifications. 

l2 The presence of bias regressor coefficient estimates also implies that residual estimates are biased. As 
such, Goldfield and Quandt [15] and Glejser [14] type tests for homoscedasticity are questionable. 

13 Sce footnotes 8 and 14. 

u Siegfhed [29] notes that Whitney [39] was the first to recognize the ceiling effect in economic education 
test instruments and to propose the "Gap-Closing" measure to account for nonlinearity; see footnote 15, 

,5 Gery [12] is typically recognized for his work with the TUCE ceiling effects and this "Gap-Closing" 
measure; see footnote 14. 

^e sample which could be used to estimate the learning equation by TSLS is the subset of the original 
sample (Af = 330) containing those 219 cases for which Minnesota Scholastic Aptitude Test (MSAT) 
scores were available. To test for equal distribution of the 41 negative D TUCE sec \s throughout the 
sample for which MSAT scores were available, the sample was divided mto quintiles based on MSAT 
scores. The following table summarizes the number of cases with D TUCE * 0 found in each quintile: 



Number of Cases with D TUCE « 0, by Quintile 



Quintiie 


1st (low) 


2nd 


3rd 


4th 


5th 


Number of cases 
with D TUCE < 0 


7 


14 


7 


8 


5 



Under the null hypothesis that the 41 cases are distributed randomly among the 219 observations the 
equation below is distnbuted as a Chi-square with four degrees of freedom: 



1 



(0.2) (41) 



i=l 



/ "(0.2) (41) 



where/ = number of cases in the /th quintile 

The x 2 value is calculated to be 5.7 1 , the null hy poihesis that the cases for which D TUCE « 0 are evenly 
distributed across aptitude levels is accepted at an 80-percent confidence level. 
,7 The presence of negative/?' values for TSLS regressions is the consequence of the manner in which the 
regression sum of squares is calculated. 
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Assessing the Impact of an Instructional 
Innovation on Achievement Differentials: 
The Case of Computer- Assisted Instruction 

Jchn F. Chizmar, L. Dean Hiebert and 
Bernard J. McCarney 



This paper sets forth an alternative to the conventional procedure for assessing the 
quantitative impact of an instructional innovation on achievement differentials. The instruc- 
tional innovation chosen for the purposes of this paper was computer-assisted instruction 
(CAI. )Any innovation, however, which affects student allocation of time could be analyzed in 
a similar manner. 

The impact of the innovation was evaluated by using a procedure which allows it to affect 
the production of cognitive achievement in a way which is more general than usually specifie !. 
Typically, the effect of an innovation is specified by a neutral shift parameter in the production 
function for cognitive achievement. In the procedure adopted in this paper, the innovation can 
also affect cognitive achievement by separately influencing the "productivities" of student 
characteristics. Using the more general procedure, the differential in mean TUCE scores 
between users and nonusers of CAI was decomposed into the effect due to CAI and the effect 
due to differences in student characteristics. Evidence is found that users of CAI performed 
minimally better than nonusers of CAI as measured by a "TUCE differential , " but they did so 
in spite of CAI, rather than because of it. 

The Experiment 

The experiment was conducted in the Fall 1975 semester at Illinois State University. A 
large section (/V = 380) of the macro principles of economics course constituted the population 
for the experiment. CAI in economics was viewed as a complement to on-going instruction in 
that the system was applied as an appendage to a "traditional" macro principles course. The 
experimental design permits an examination of the impact of CAI within the context of the 
student's choice decision, since students were given the option to use or not Aise CAI at their 
own discretion. In addition, an incentive system was created to encourage student utilization of 



In general, a CAI system could increase cognitive achievement for the following reasons: 

1. The computer permit elements of self-pacing. 

2. The computer provides instantaneous positive reinforcement and instantaneous 
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3. The CAI packages and class attendance may be complementary in the production of 
cognitive achievement. If the student uses a "review" package after attending the 
appropriate lecture, the review routine would give him an evaluation of his knowledge, 
allowing the student to test himself, reinforce concepts presented in the lecture, and 
remedy deficiencies by means of prompt and further study. 

However, the student may perceive that CAI packages can be substituted for class attendance in 
the production of cognitive achievement. If the use of CAI review packages reduces the cost of 
production (including time costs), the student may simply adopt the new method in the 
production of a target level of achievement. 2 Hence, one cannot make an a priori judgment 
about the overall impact of CAI on student achievement. 

A single output measure was employed to evaluate the CAI technology: student scores on 
Part I , Form B, of the nationally normed Test of Understanding in College Economics (TUCE), 
administered in a pre- and postcourse manner. 3 

Decomposition of the Raw TUCE Differential 

If CAI presents a genuine method of enhancing cognitive achievement, then the test 
performance of the CAI user group should be significantly higher than that of the nonuser 
group, holding constant other factors such as ability, motivation and maturity which may 
influence test performance. The conventional format 4 for assessing the impact of an innovation 
on cognitive achievement assumes that the sole effect of the innovation is captured by a single 
coefficient in the equation explaining cognitive achievement. A more general specification of 
the effect of an innovation would allow ail the coefficients to differ between users and nonusers. 
This allows CAI, say, to affect cognitive achievement by separately affecting the "pro- 
ductivities" of the individual human capital inputs. Because students could select to use CAI, 
the characteristics of users may be different than the characteristics of nonusers. Thus, 
cognitive achievement may differ between users and nonusers because characteristics differ 
and because the coefficients differ. It is of interest, then, quantitatively to assess the magnitude 
of these two causes of differences in achievement between users and nonusers. 

Following a methodology used by Blinder [ 1 ] in a different context but which is generally 
applicable, we propose a decomposition of the raw TUCE differential between users and 
nonusers into the estimated effects due to differences in individual characteristics and the 
estimated effects due to differences in coefficients. The latter effect is attributed to CAI. To 
allow explicitly for differences in coefficients between the two groups, separate regressions 
wetc estimated for each group: 



where 

X, = row vector of individual characteristics for the ith student. Includes a proxy for 
student scores on the post-TUCE (PROXY) designed following a procedure 
suggested by Soper and Thornton [5], the student's age in years (AGE), student's 
grade point average (GPA), and student's sex (SEX) measured as a binary variable 
with 1 = male. 

0 = column vector of coefficients 

€ { = disturbance term 

and where the u superscript indicates that the student belongs to the CAI user group and n 
indicates membership in the nonuser group. 



Post-TUCE"= j3 u 0 +X;'j3 u + €j 



(1) 



and 



Post-TUCE ? = p n 0 + X n t p n + 



(2) 
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Evaluating equations (1) and (2) at the mean levels for the characteristics and subtracting 
yields the TUCE differential: _ _ 

D = (ft - ft) + (X">3" - X»fi») (3) 
The conventional format assumes that the sole effect of an innovation such as C AI is - p J ). 
However, using the user group coefficients as weights, the TUCE differential can be 
decomposed as: _ _ _ 

D m (ft - pi) + [ (X M - X") 0" + X" 08" - p rt ) J (4) 

The first term within brackets is an effect due to differences in individual characteristics and the 
second term is an additional effect due to CAI which has been ignored in the past. 

Alternatively, the TUCE differential could be decomposed using nonuser coefficients as 
weights to yield: 

D = (j3" 0 - ft) + [ (X" - X") fi" + X" 03" - p") ] (5) 

Since neither decomposition is preferred theoretically, the arithmetic mean of the two 
procedures will be used. 

Table I decomposes the TUCE differential into its separate parts. The raw TUCE 
differential amounts to .6058. The effect of CAI is found by subtracting the effects of 
differences in individual coefficients. Because there are two weighting schemes, there are two 
estimates. Using user regression weights, the effect due to CAI equals - .3246. Using nonuser 
regression weights, the effect due to CAI equals .0873. The arithmetic mean of these two 



Table 1 

Decomposition of the Raw TUCE Differential 



Characteristics 


User 

Regression Weights 
Adjustment 3 % of D 


Nonuser 
Regression Weights 
Adjustment 11 % of D 


Adjustmenr^^^^ 
for user dif- 
fcrences in: 


^ .6058 


100% 


.6058 


100% 


PROXY 


.2042 


33.71% 


.2962 


48.89% 


GPA 


-1.2079 


-199.29 


-.8994 - 


-148.46 


AGE 


- .0198 


-3.27 


.0150 


2.48 


SEX 


.0931 


15.37 


.0697 


11.51 


Effect due CAI 


-.3246 


-53.58% 


.0873 


14.41% 



The user and nonuser regression weights came from estimating equation (1), 

POST-TIKE? '= 3.159 + .346 PROXY" + 3.477 GPA'/ + .186 AGE" 
(.294) (.000) (.000) (.239) 

+ 1.257 SEX'J (ff 2 = .3732) 

(.007) 

and equation (2), 

POST-TUCE'/ = 11.503 + .502 PROXY)' + 2.589 GPA" -.141 AGE" 
(.179) (.003) (.000) (.759) 

+ .940 SEX? (ft 2 = .3731) 

(.279) 

The level at which the coefficient is just significant is indicated in parentheses. 
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estimates is - . 1 1 87 . Thus, CAI accounts for approximately - 19.59 percent of the raw TUCE 
differential. 

This analysis, therefore, suggests that the effect of CAI is negative but small. Users as a 
group performed slightly better in relative terms on the TUCE than nonusers. But they did this 
in spite of CAI primarily because they had more ability as measured by GPA. The analysis also 
suggests that the students with more ability choose to use CAI, at least within the scenario in 
which this experiment was run. 5 

These results, however, do not necessarily lead to the conclusion that this application of 
CAI is cost ineffective. Suppose that students who elect the CAI option are simply substituting 
CAI review packages for class attendance. If this production technique reduces the costs of 
production, then (given achievement in economics) the student can produce higher achieve- 
ment in other courses (or consume more leisure). These latter benefits, when compared with the 
costs of CAI, could yield a favorable assessment of the cost effectiveness of CAI. 

In summary, utilizing a more general specification of the impact of an innovation on 
cognitive achievement, we find that the effect of CAI on cognitive achievement after adjusting 
for differences in characteristics between users and nonusers is negative but small. We can only 
speculate about the reasons for this result. One possible interpretation is that achievement in 
other courses (together with leisure) is substituted for achievement in economics by students 
who elect the option to use CAI. An alternative explanation would recognize that the user 
student must select programs under uncertainty. In the absence of complete information 
concerning the "payoffs'' associated with each program, it is unlikely that users would succeed 
in finding efficient input combinations to utilize in the production of achievement. The proper 
interpretation of results concerning the impact of CAI is clearly an important topic for future 
research. 



Footnotes 

l Thc CA! system consists of three different types of programs that have been labeled "reviews, 
"demonstrations* * and "simulations.*' This system was developed at the University of Notre Dame by 
FrankJ. Bonello and William I.Davisson [2,3] under a Sloan Foundation grant The developers view the 
"review" rout'nes as the heart of this system. 

l ln fact, Kelley [4] argues that achievement in economics may well be an inferior good. 

3 The authors chose to use the TUCE as the cognitive evaluation instrument because it will permit 
comparisons with the experiments conducted by Bonello and Davisson in addition, the use of the TUCE 
reduces the possibility of inadvertently preordaining our results by devising an instrument that is biased 
toward either users or nonusers of CAI. Finally, the TUCE embc es the overall objectives of the macro 
principles course. 

4 The results obtained from estimating the conventional format are available upon request to the authors 
5 The conclusion that CAI exerts a significant negative impact on cognitive achievement differs from that 
reported by Bonello and Davisson. They report [3] on two different experiments which utilize the 
conventional format. In the first experiment CAI was viewed as a substitute for discussion gro»ips and in 
the second experiment CAI was used as an appendage to a traditional principles course. The authors 
analyze both experiments using ordinary least squares regression and conclude from both experiments that 
CAI had an insignificant effect on cognitive achievement. 
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ECONOMIC EDUCATION 

What do Economics Majors Learn? 

By David G. Hartman* 



The introductory economics course has 
been studied extensively, with respect to 
both course content and instructional 
methods. Far less is known about the re- 
mainder c the program for undergraduate 
economics majors. As/an initial step toward 
understanding and improving the educa- 
tional experience of economics majors, this 
paper examines what economics majors 
learn. Specifically, the primary results give 
information about what students at Har- 
vard learn, but it is hoped that insights into 
the experience of students in genera! can be 
gained. Because my primary concern is 
with genera] economic understanding, the 
emphasis of this paper will be on how well 
students learn micro- and macroeconomic 
concepts and how to apply them. The study 
begins with a few casual observations. 

Before any formal analysis was un- 
dertaken, there were reasons to believe that 
improvements in the economics program 
were needed. First, as one who has taught 
introductory economics at Harvard and 
also been in charge of the general examina- 
tions required of all graduating economics 
majors, I have often suspected that either 
students in the introductory course know 
much less than it appears or there is a siz- 
able group of economics majors who leave 
school with little more than the level of eco- 
nomic understanding they achieved by the 
end of their first course. This evidence is, of 
cdt$&e, °^ " m ' tec ' usefulness because the 
passage of time would tend to erode the 
students' skills. Moreover, most graduating 

•Harvard University. I wish to thank President 
Derek C. Bok for his interest in and funding of this ef- 
fort. ! owe Elizabeth Allison a great debt for her en* 
couragement and advice. Liam P. EbrilK Jeff 
Wotcowitz. and Ken Sokoioff did a heroic job of ex- 
tracting the needed data from general examinations. 
David Lindauer provided many useful comments on 
an earlier draft. 



seniors are at least a year away from their 
last general micro or macro course. A more 
forma! analysis will be presented below to 
isolate the impacts of various parts of the 
economics program on student perfor- 
mance. 

A secondary source of casual evidence is 
the students themselves. It is no secret that 
a large majority of Harvard economics 
undergraduates feel that the introductory 
course is the high point of the program, 
with a number of the subsequent courses, 
particularly the intermed.ate iheory se- 
quence, considered highly repetitious of 
material covered in introductory eco- 
nomics. The feeling is widespread that a 
general analytical ability is not being 
developed as it should be in an under- 
graduate program. There are several points 
to be considered about such student com- 
plaints. First, having witnessed similar 
students feelings at another school (where 
the point was not made as vocally as at 
Harvard), 1 find it difficult to accept this 
dissatisfaction as unique. Whether it high- 
lights a significant problem is a separate 
issue. Although I tend to take fairly 
seriously the complaint of a broad group of 
students that courses are not sufficiently 
rigorous, it is certainly possible that a great 
deal of useful skill is being acquired. It may 
be that some courses only seem repetitious 
because the beginning course provides at 
least a brief introduction to all the mzyor 
topics to be examined. Graduate teaching 
assistants in the intermediate micro-macro 
sequence support this hypothesis by point- 
ing out that students often perform poorly 
because, having the; illusion of already 
knowing the material, they put too little ef- 
fort into the intermediate courses. Finally, 
it is far from clear, without further investi- 
gation, that repetition is not a valuable use 
of time, even if students complain. 



SOURCE: American Economic Review, vol. 66, no. 2, May 1978, pp. 17-22. Reprinted with permission of the author 
jnd publisher. 
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Table I— Characteristics of Honors and 
Nonhonors Economics Students 





Honors 


Nonhonors 


All 




Students 


Students 


Student 


Number in entire 1977 class 


61 


118 


179 


NJumKpr nf nHpn^c in <*nmn1i* 
l^tUllll/vl Ul siuucii o in saiiijjic 


50 


44 


94 


Number ot" students having taken 








micro courses: 








Traditional micro course 


At 

43 


-6 




Policy micro course 


A 
■t 




9 


Ora filiate mirrn course 


3 


2 


5 


Number of student s ha vine taken 




11 


II 




0 


Number of students having taken 








macro courses: 








Traditional macro course 


27 


18 


45 


Graduate macro course 


23 


10 


33 


Number of students having taken 








no macro course 


0 


16 


16 


Average introductory micro course 








grade (0-1 5 scale) 


11.45 


998 


10.79 


Average introductory macro course 








grade (0-1 5 scale) 


11.86 


10.56 


1K27 





These issues are essentially empirical; 
empirical evidence to be presented will 
hopefully provide some answers. The 
investigation will proceed by first discuss- 
ing specific goals of the economics 
program, then suggesting a method for 
measuring how well students meet these 
goals, and finally attempting to discover the 
contributions of various parts of the 
program to the students' achievement. 

The genera] goals of an economics 
program are highly controversial, as Robert 
Horton and Dennis Weidenaar discovered 
in their survey. The more specific ^objec- 
tives implicit in the Harvard method of 
evaluating graduating students seem 
roughly consistent with the Horton and 
Weidenaar "consensus" goal; the Harvard 
general examinations in economics are 
designed to test a student's ability to use 
economic tools to answer real world ques- 
tions. Specifically, a student's knowledge 
of macro theory, knowledge of micro 
theory, ability to apply macro theory, and 
ability to apply micro theory are evaluated 
and can be given separate scores. For pur- 
poses of this paper, attainment of skill in 
those four categories will be taken as the 
objective of the program. Because it 
represents the best information available, 



the score on the general exam in each cate- 
gory will be used to measure what an eco- 
nomics student has learned. The problems 
associated with using test scores to 
represent achievement should be kept in 
mind as the empirical results are analyzed. 

I. Majoring in Economics at Harvard 

Economics students choose between an 
"honors" major and a "nonhonors" m^jor. 
In a typical >ear, there are about 75 honors 
and 125 nonhonors majors. Only honors 
students are required to complete a course 
in both intermediate micro and macro and 
to write a senior thesis. Although 
nonhonors students have neither require- 
ment, many complete the intermediate 
theory courses because of an interest in the 
topic, because they wish to be well pre- 
pared for the general exam, or because they 
had at some time intended to pursue an 
honors program. A good grade record is re- 
quired for graduation with honors, so there 
is a natural presumption that honors mayors 
are better students. However, there are a 
number of reasons for good students to do a 
nonhonors program, including the desire to 
take a wider range of course^ than the 
honors requirements would allow. Table 1 
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gives comparisons of 1977 honors and 
nonhonors students in terms both of grades 
in introductory courses and of elective 
courses taken. The comparisons are based 
on the sample of students to be used in the 
empirical study. 1 

Honors students have a number of op- 
tions in meeting the intermediate micro- 
macro requirement. The courses with the 
largest numbers of students are the "tradi- 
tional** micro and macro offerings. The em- 
pirical investigation which follows i& based 
on students who took the 1977 general 
examinations, but who took the traditional 
micro or macro course at a variety of points 
in their careers, with different instructors 
and texts. In addition, the traditional 
courses apparently are becoming more like 
the "alternative" intermediate courses 
than were the traditional courses taken by 
most 1 977 graduates. 

There are two other courses in micro 
# which satisfy the honors requirement. Al- 
most 10 percent of the sample students took 
the policy and applications oriented micro 
course, which is taught by the case method 
with a large number of problem sets and a 
great deal of assigned reading. The other 
course, with about 5 percent of the sample 
students, is taught at nearly the level of so- 
phistication of the course intended for eco- 
nomics Ph.D. students. It requires a 
mathematical preparation beyond most 
undergraduates and is also taken by 
graduate students from other departments. 
Because of this variety in the Harvard 
program, it will be possible to determine 
the impact on learning of the traditional 
micro course compared to two very dif- 
ferent alternatives (to be called the 
"policy" micro course and the "graduate** 
micro course); a reference group is avail- 
able since 12 percent of the students had 
had no micro course past the introductory 
level. 

There is only one alternative to the tradi- 

'The 94-studcnt sample on which these comparisons 
are based results from requiring that personal data be 
available and that the students included took the 
spring 1977 exams. Nonhonors students may sub- 
stitute a fall exam, so they are underrepresented in the 
sample. I have no reason to believe, however, that this 
sample is biased in any way critical to the results. 



tional macro offering. It was taken by 35 
percent of the sample students. Despite the 
fact that this course is intended to be taught 
ai nearly the level of the course for eco- 
nomics Ph.D. students, and is taken by 
graduate students from other departments, 
the level of math required is not above that 
of most Harvard undergraduates. The ma- 
terial presented not only is more difficult 
than that in the traditional macro course 
(particularly since the course employs an 
extensive reading list rather than a basic 
text), but also puts more emphasis on 
policy. An attempt will be made to assess 
the impacts of the traditional course and 
this graduate course on student perfor- 
mance. Once again, the reference group 
consists of those students ( 1 7 percent of the 
sample) who had had neither course. 

II, Measurement and Empirical Results 

Graders of the 1977 general examinations 
were asked to assign each student four 
grades (each on a 0-10 scale): I) micro 
theory; 2) micro application; 3) macro 
theory; and 4) macro application. Each 
question was graded by two graduate 
students familiar wit? the undergraduate 
micro-macro courses. Since different ques- 
tions are asked on the honors and non- 
honors exams, it was important that 
specific grading standards be adopted. The 
graders reported little problem with consis- 
tency in scoring the two exams because of 
the similarity and generality of the ques- 
tions asked. However, a concern w*s 
expressed, particularly by the macro 
graders, about their ability to separate the 
students' knowledge of theory from their 
ability to apply it, based on answers to 
these general questions. The empirical 
results tend to confirm this difficulty with 
macro; I suspect that it is in the nature of 
macroeconomics for theory and application 
to be so closely related. 

To allow a direct confirmation of grading 
consistency across the two examinations, a 
"practice examination** was given to thirty 
of the students two weeks prior to the 
general exams. The practice test consisted 
of multiple choice questions chosen to fal! 
into the four categories of interest. By hav- 
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ing a set of scores on a common test to 
compare with the scores on the honors and 
nonhonors exams, it was possible to verify 
that in no case was there any significant 
bias in the scoring of honors and nonhonors 
general exams. 

Separate regressions were run to esti- 
mate the impact of courses taken as part of 
an economics m^jor on students' measured 
knowledge of micro theory, ability to apply 
micro theory, knowledge of macro theory, 
and ability to apply macro theory. A simple 
model is assumed: an economics major's 
level of skill in each of the four categories 
depends on his/her innate ability, the skill 
developed in introductory economics, and 
the amount learned in the relevant courses 
taken as part of the economics program. 
Measurement of the first two factors will be 
discussed as the empirical results are 
presented. The contribution of the eco- 
nomics program to a student's skill, ob- 
viously the factor most important to isolate 
for purposes of this paper, is estimated us- 
ing a set of variables indicating which 
courses a student has taken. With respect 
to microeconomics, three dummy variables 
indicate whether a student has taken the 
traditional micro course, the policy micro 
course, or the graduate micro course (or, of 
course, none of these). Another variable is 
the number of ''micro related field" 
courses a student has taken. 2 In the 
macroeconomics regressions, two dummy 
variables indicate whether a student has 
taken the traditional macro course or the 
graduate macro course (or neither). 
Another variable is the number of macro re- 
lated field courses a student has taken. 3 

In the initial attempt to explain students' 



2 As used in this study, microeconomics related field 
courses are defined as: economic principles a id public 
policy; public finance; development economics; 
international trade and investment; economics of 
managerial decisions; business organization and be- 
havior; markets and market structure; labor eco- 
nomics; urban economics. 

3 As used in this study* macroeconomics related field 
courses are defined as: monetary theory and fin&r.cial 
institutions; applied macroeconomics; public finance; 
international monetary economics, development eco- 
nomics. 



knowledge of micro theory (equation (1) in 
Table 2), grades in the microeconomics half 
of the introductory course were used to 
represent both ability and the learning ac- 
quired in the course. The rest of the equa- 
tion consists of the ''economics major" 
variables discussed above. The dependent 
variables and introductory course grade 
variables are expressed in logs in every 
estimation. So, the coefficients on the 
micro course dummy variables and the field 
course variable in (1) are the estimated 
percentage increases in micro theory score 
produced by taking the associated course. 
Therefore, the traditional course is esti- 
mated to increase one's micro theory score 
by about 24 percent, the policy micro 
course by 44 percent, the graduate micro 
course by 51 percent, and an additional 
micro field course by 9 percent. All of these 
effects are significant at or very close to 
significance at the 5 percent level. 

These results should, however, be 
regarded with suspicion because of inclu- 
sion of only the introductory course grade 
to measure ability. In particular, the equa- 
tion (1) result could occur as a consequence 
of students choosing courses based on their 
ability. For example, if the most able eco- 
nomics students choose the graduate micro 
course, its larger estimated impact need not 
be evidence of any learning premium over 
the traditional course. I would anticipate 
that the most significant bias results from 
not having any measure of math skill in 
equation (1), since the best information a 
student has about his aptitude for micro 
(aside from math) is probably his introduc- 
tory course grade. 

Numerous measures of student ability 
including SAT scores, high school class 
rank (adjusted for school size), and in- 
terview ratings, as well as data on race and 
sex, were available from admissions in- 
formation. The SAT math scores, as ex- 
pected, were the only significant ability fac- 
tor in any regression. The inclusion of the 
SAT scores (equation (2) ) reduces to in- 
significance the introductory micro course 
grade variable. Since the sign is then 
wrong, the coefficient is constrained to be 
zero. 
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The resulting equation (3) has 
substantially lower estimated impacts for 
components of the economics program, 
confirming the extent of bias introduced 
when math ability is left out. Now, only the 
policy micro course meets the strict cri- 
terion for significance normally employed; 
the number of micro field courses does, 
however, have a significant impact at the 
.05 level in a one-tail test. The graduate 
micro course (.10 significance level) and the 
traditional course (.20 level) do not fare so 
well. The small number of students having 
taken the graduate course (see Table I) may 
be a contributing factor to its insignificance. 

The corresponding equation ((6)) for 
ability to apply micro concepts indicates 
that, of the components of the economics 
nuyor, only the policy micro course has an 
impact on student performance significant 
at even the .10 level. The traditional course 
and the field courses produce an estimated 
improvement in a student's ability to apply 
microeconomics with a significance of .20; 
the influence of the graduate micro course 
is even less significant. 

The results explaining students' 
knowledge of macro theory and their ability 



to apply it lend support to the contention 
that it is difficult to separate the two scores: 
the results are very similar. The results also 
show that math ability does not play a 
significant independent role. 

Table 2 shows the significance level of 
the graduate macro course in explaining the 
macro theory score on the general examina- 
tions is .001, while the traditional course 
and the macro field courses are significant 
at the .05 level. Taking the graduate macro 
course produces an estimated 21 percent 
increase in exam score, while the impact of 
the traditional course is 1 1 percent and the 
effect of each macro field course is an esti- 
mated 4 percent. 

The large differences in the macro theory 
and applications scores seem to imply that 
the traditional course and the field courses 
have a less significant effect on students* 
ability to apply macro concepts. It should 
also be noted that the R z of the applications 
equation is substantially below that of the 
macro theory equation. This implies that 
the ability to apply macro concepts is more 
difficult to measure and/or is determined 
more by the ability factors not controlled 
for, than is knowledge of macro theory. 



Table 2 



Equation 



Dependent 
Variable 

Mat:*) 



Introductory 
C ourse 
Grade 



Traditional 
Intermediate 

(dummy) 



Policv 
Intermediate 
Course 
(dummy) 



Graduate 
Intermediate 
( ourse 
(dummy) 



Field 
Cour\c\ 

Taken 
(number) 



Math 



Verbal 



Constant Micro Macro Micro Macro 



Micro 



Miu< 



Mauo Micro Macro Store Score R 2 



(1) 


Micro Theory Score 


1 II 


m 




219 




418 


MO 




094 








14 






(4 12) 


( 83) 




(1 92) 




(2 47) 


(2 16) 




( i 69) 










(2) 




-7 21 


- 063 




129 




160 


102 




0<6 




1 261 




31 






<3 39) 


< 60) 




(1 09) 




(2 16) 


(1 46) 




(1 67) 




(2 82) 


( 2<> 




(3) 




-6 91 






126 




160 


100 




0<7 




1 177 


112 


31 






O 16) 






(107) 




(2 17) 


(1 46) 




(1 71) 




(2 ->9> 


( 10) 




(4) 


Micro Application Score 


1 Ji 


- 018 




303 




!7< 


184 




096 








06 






(3 92) 


( 14) 




(I 97) 




(I 71) 


(1 44) 




(2 22) 










(?) 




-9 26 






168 




294 


119 




047 




1 8<9 


- H3 


27 






(3 <8> 


(1 71) 




(1 17) 




(1 44) 


1 47) 




(1 l<> 




(1 41) 


( 29) 




(6) 




-8 20 






l<4 




294 


114 




OV) 




1 "9 


- 077 


Z-i 






13 22) 






(1 06) 




(1 41) 


( 4<) 




(1 21) 




(2 98) 


( 17) 




(7) 


Macro Thewy Score 


1 34 




174 




107 






218 




oiy 






37 




(7 03) 




<2 n> 




(I 90) 






(l 12) 




(1 78) 








(8) 




1 22 




173 




107 






214 




040 


101 


- 08< 


38 






(1 07) 




(1 98) 




(I 87) 






(1 01) 




(1 80) 


( 46) 


( 41) 




(9) 


Macro Application Score 


1 <8 




073 




086 






211 




040 






28 




(6 91) 




( 74) 




(1 27) 






(2 91) 




(1 <4> 








(10) 




78 




0<4 




OH 






211 




042 


129 


002 


29 






( V7> 




( <l) 




(1 17) 






(2 <2> 




(1 <8) 


< 48) 


( 01) 





Note Absolute values of / statistics are shown in parentheses To obtain all data, vample was leMrittcd io 94 sludenU fhercfoie significance levels 
(one-tail test) are approximately / * 2 37 significance level * 01 / » J 99 significance level = 025. / ^ 1 66 significance level - OS. / * I 29 
significance level * to 
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III, Summary and Conclusions 

There are numerous reasons for statis- 
tically insignificant results and, therefore, 
there is great risk in placing too much em- 
phasis on the empirical evidence presented 
here. However, those concerned with what 
economics majors learn can take little 
comfort from this analysis. 

The traditional microeconomics course 
appears to have a small impact on either 
students' knowledge of micro theory or 
their ability to apply it to real world prob- 
lems at the end of their college careers. One 
of the alternatives offered to Harvard 
students, a graduate-type highly theoretical 
course, does a bit better in adding to the 
students' knowledge of micro theory, but is 
worse when it comes to teaching them to 
apply the theory. Only the policy oriented 
micro course has a verifiable effect on the 
students' understanding of micro theory; 
even so, its impact on their ability to apply 
micro concepts is below usually acceptable 
significance levels, once adjustments are 
made for mathematical ability. 

The traditional intermediate macro 
course gives evidence of improving macro 
theory scores, although its effect on ability 
to apply those concepts is below usual 
significance standards. The graduate level 
macro course, which is oriented to policy 
and not particularly mathematical although 
demanding, is a highly significant explainer 
of student knowledge. Finally, the micro 
and macro field courses tend overall to 
have a moderate impact on student scores. 

At the risk of overemphasizing these 
admittedly rough statistical conclusions, it 
appears that there is a problem with the 
economics major. After years of study and 
improvement in the beginning course, it 
seems time to begin a similar effort with 
respect to the rest of the undergraduate 
program. It is not surprising that with a be- 
ginning course which has become a 



thorough and rigorous introduction to the 
discipline, students find the traditional 
courses which follow repetitious. They may 
be mistaken and, in any event, repetition 
may be quite valuable, but the evidence 
available does not substantially dispute 
their conclusions. Study and innovation, 
which are taking place in traditional 
courses at Harvard in response to the 
changes in the knowledge possessed by 
students emerging from the introductory 
class, must be encouraged. At a minimum, 
new methods of presenting information 
should be devised to prevent student 
dissatisfaction. From Harvard's experience 
with alternative courses, it would appear 
that at least the average student is capable 
of learning significantly more difficult ma- 
terial, by reading more extensively and 
working on more independent assignments, 
than is normally taught in intermediate 
courses. On the other hand, it seems 
unproductive to offer courses too much like 
those given for s r aduate students because, 
without experience in problem solving, 
undergraduates do not learn to apply the 
theory they are taught. 

As indicated at the start, this paper 
represents just an initial step. The conclu- 
sions are necessarily imprecise and must be 
sharpened by study of systems at other 
schools. Until such broader information is 
available, application of these results to 
other than the specific courses studied here 
is quite risky. The effects on student learn- 
ing of courses other than those in micro and 
macro and the impact of the economics 
program on knowledge retained years after 
graduation are also areas in which further 
research is necessary. 
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TIPS and Technical Chauge in 
Classroom Instruction 



By Allen C. Kelley* 



This paper presents some research re- 
sults on the efficiency of a teaching tech- 
nique (TIPS) I have used in the Principles 
of Economics course at the University of 
Wisconsin-Madison. My presentation will 
be extremely selective; only the results of 
five out of some ten different output mea- 
sures are reported. I shall argue that TIPS 
represents an improvement over the com- 
monly employed lecture-discussion class- 
room technology. The model I use to eval- 
uate the efficiency of TIPS is broader the 
that usually used in educational researo, 
since it takes into account not only the 
total magnitude of the benefits and costs, 
but also their distribution. 

TIPS 

TIPS (Teaching Information and Pro- 
cessing System) is a testing and evaluation 
system which pro'^des the capability of 
increasing the level of individualized in- 
struction in the classroom (Kelley, 1968, 
1970). TIPS enables the instructor to 
prepr administer and process short 
multiple-choice "surveys" on a regular 

* Professor of Economics, University of Wisconsin- 
Madison. This research was supported by the Esso Edu- 
cation Foundation. The Graduate School of the Uni- 
versity of Wisconsin provided substantial computer 
time. The assistance of James K Matson, Robert M. 
Schmidt, and Michael L. Wiseman should be con- 
spicuously acknowledged. I have also benefited from 
the comments of Rendigs Fels, W. Lee Hansen, and 
Burton A Weisbrod. Given space limitations, the pres- 
ent paper has been severely condensed. An expanded 
version is available tnm the author upon request. 



basis throughout the semester. Based 
on the results of each survey and on in- 
structions or "decision rules" 4 wvided by 
the professor, a series of instructional re- 
ports are prepared and printed by data 
processing*equipment. Under normal cir- 
cumstances the surveys are given once every 
week and require ten to fifteen minutes of 
class time. To dat^ the surveys have not 
been used for grading; they have been 
administered to provide interim informa- 
tion used to diagnose student difficulties 
and to prescribe remedies before major 
examinations take place. 

Three major sets of instructional report* 
are generated by TIPS. A Student Report, 
prepared for each student in the class, is 
available three to four hours after the 
student leaves the classroom. This report 
provides individualized assignments for 
each student based on his measured pro- 
ficiency on the various concepts covered 
on the TIPS survey. A student performing 
well on one concept may receive an enrich- 
ment and/or optional assignment; on an- 
other concept, where deficiency is revealed, 
he may receive a lower-level required as- 
signment. Assignments may also be based 
on the student's learning skills, e.g., his 
mathematical versus his verbal ability. 
The TIPS survey results permit identify- 
ing well before formal examinations those 
students who are failing the course; 
vidual tutorials and compensatory instruc- 
tion may then be arranged. Superior stu- 
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dents may be provided the option of 
writing papers or engaging in research in 
lieu of taking the exam. 

Summary reports are prepared for both 
the professor (covering class performance) 
and the teaching assistant (covering the 
results of each TA's sections). These re- 
ports provide the feedback required by the 
instructor to modify his class assignments 
and presentations in response to revealed 
deficiencies and strengths. The TA reports 
also provide a list of assignments required 
of each student, together with lists of stu- 
dents who, for example, are required to 
establish tutorials or who have been pro- 
vided the option of writing papers and 
engaging in term-paper research., 

TIPS employs some of the oldest prin- 
ciples of instruction but uses modern tech- 
nology to provide each student a course of 
instruction appropriate to his needs. The 
degree of individualized instruction facili- 
tated by TIPS' is largely invariant to class 
size. This approach is applicable to a wide 
range of disciplines where course objectives 
can in large part be measured by well- 
designed objective-type questions. The 
system is designed to attenuate instruc- 
tional problems inherent in higher educa- 
tion where expanding enrollments and 
rising costs are accompanied by large class 
sizes and where student abilities and inter- 
ests span a wide spectrum. 

A Model of Educational Evaluation 

The model of educational evaluation 
employed below emphasizes the distribu- 
tion, as well as the total magnitude, of 
benefits and costs associated with alterna- 
tive instructional approaches (Hansen 
el a/.). The pervasive failure to consider 
distributional issues in educational evalua- 
tion is tantamount to assuming either that 
students are a homogeneous group — each 
studeut receives the same amount of out- 
puts and sustains the same amount of 
costs, and the outputs and costs are valued 



equally for each student — or that students 
should be treated as if they were a homo- 
geneous group. Two implications of these 
observations relate to the appraisal of the 
literature evaluating educational tech- 
nology. First, there is an abundance of 
studies which fail to identify a statistically 
significant impact on student performance 
of alternative educational approaches; this 
may, as McKeachie has observed, result 
from the fact that "methods optimal for 
some students are detrimental to the 
achievement of others" (p. 1157). Second, 
if students do benefit differentially from 
alternative teaching techniques, the sta- 
tistical models which omit these distribu- 
tional effects are misspecified; they pro- 
duce statistically biased and typically un- 
interpretable results. 

The Experimental Evaluation of TIPS 

In the fall of 1968 TIPS was utilized in 
an experiment in the Principles of Eco- 
nomics course at the University of Wis- 
consin-Madison. The objective was to 
assess the impact of TIPS on student 
achievement, student attitudes toward the 
course and TIPS, and student retention 
of economic principles (measured one year 
later). Students in the experimental group 
employed TIPS during the first eight 
weeks of the course. Students in the con- 
trol group were taught using a lecture-T. 
A. -homework assignment format thought 
representative of that widely employed in 
the principles course. The control and ex- 
perimental student groups met with their 
professor three times a weeK; the fourth 
session, a discussion led by a graduate 
teaching assistant, met in smallei groups of 
fifteen to twenty-five students. The total 
amount of homework assignments in the 
two groups was approximately the same. 
In the control group all students received 
an identical "average" assignment. In the 
experimental group the students in diffi- 
culty received larger and lower-level as- 
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signments; those demonstrating profi- 
ciency were given optional, ungraded 
assignments. 

The control and experimental lectures, 
each comprising about 250 students, met 
with the same instructor at contiguous 
lecture hours in different buildings. In both 
groups students received almost identical 
lectures. TA's were randomly selected g/ 
from the Departmental "pool" and ^ere* 
assigned to one of the two lectures without 
bias. Identical text materials were re- 
quired. Subsequent analysis reveals that a 
minimum of student interaction between 
the two groups occurred; furthermore, 
statistical tests show that the two lecture 
groups possessed a statistically identical 
distribution of attributes: aptitude, prior 
academic achievement, sex, academic major, 
class, and mathematics background. The 
"Hawthorne effect, " likely present, was 
attenuated by the procedure of briefing 
students in the TIPS class with the use of 
a nonpromotional, printed document de- 
scribing the system and the experiment. 
This avoided any tendency toward over- 
emphasis of the experiment by the instruc- 
tor. 

Considerable care was taken to obtain 
output measures which were valid and 
unbiased. A two-hour mid-term examina- 
tion contained twenty multiple-choice 
questions drawn from those provided in 
the instructor's manual to the text; none 
of the questions had appeared on TIPS 
surveys. The student also answered five 

(1) 0 = 18.35 + .17 ACTSAT+ .08 LogHSP 
(5.18) (5.98) (2.85) 



short-answer questions of an applied- 
problem type, and had a choice of one of 
two long essays. The short-answer ques- 
tions and essays were equally divided be- 
tween questions submitted by TA 's in the 
two lectures. Students in both lectures 
were administered identical examinations 
at the same hour (different buildings). 
Elaborate precautions were taken to en- 
sure objectivity of grading: multiple choice 
questions were machine graded; the re- 
maining portions of the tests were anony- 
mized by removing student names and 
assigning a numerical code for subsequent 
reassembly. All responses to a particular 
question by students from both lectures 
were graded by a single TA. Undoubtedly 
the grading possessed significant variance 
in terms of accuracy; we assert, however, 
that there .vas no bias which would pre- 
clude an ob ective appraisal of the impact 
of TIPS. 

Impact of TIPS on Student Achievement 

Space precludes the presentation of the 
theoretical model underlying the analysis, 
the results for each of the output measures, 
a discussion of the estimation format em- 
ployed, and a defense and interpretation 
of each variable included in the regression. 
The results presented in equation (1) for 
the aggregate score from the first midterm 
examination are representative of the out- 
put measures over that portion of the 
course when TIPS was used. Least squares 
regression procedures were employed; / 
statistics are in parentheses. 

+ 3.23 SOPH + 3.95 CP PER + .31 MATH 
(3.08) (2.59) (.28) 



+ .94 PSEXG + 1.84 Bi r S + 1.16 ECOX + .71 COMASGX + .09 PCTSECT 
( .67) (1.63) ( .62) (3.11) (2.98) 

+ 1.56 ASXDOXE - .30 LogA SNDOXE • A CTSA T + 38.35 TIPS - 3.96 ASXDONE- 
(2.17) (-1.66) (3.26) (-2.15) TIPS 

+ .82 ASXDOXE- LogACTS AT -TIPS - 6.39 LogACTSA T- TIPS 0 = 52.09 
(1.77) (-2.20) r' = .34 
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Student achievement was positively and 
significantly related to the number of 
homework assignments completed (ASN- 
DONE), the percentage of sections at- 
tended (PCTSECT), whether the student 
was a sophomore or an upperclassman 
(SOPH, UPPER), his ACT or SA T score, 
his graduating high school percentile rank- 
ing (HSP), the difference between the 
number of I required assignments and 
those handed , (COMASGN — a measure 
of commitment?), and whether he was in 
the TIPS class. Neither knowledge of 
calculus {MATH) nor major (PSENG, 
BUS, ECON— physical science or engi- 
neering, business, economics) contributed 
significantly to examination achievement. 

An interpretation of the results in (1) is 
facilitated by comparing the aggregate 
performance of four "representative" stu- 
dents. Table 1 presents the predicted score 
of four students and the percentage con- 
tribution to that score due to the several 
independent variables. Charles Kinbote 
and Jack Gradus are "average" students 
in the TIPS and control groups, respec- 
tively; they are "twins" in all respects 
except TIPS. John Shade is a relatively 
low achiever in the TIPS class; Sybil 
Swallow is relatively bright. 

As in most studies, prior aptitude and 
achievement constituted the most impor- 
tant independent variables, accounting for 
around 25-35 percent of the explained 
variance. Section attendance counts sig- 



nificantly and positively, although these 
results do not necessarily measure the 
absolute contribution of the TA 's or sec- 
tions (i.e., there was no control), but rather 
the impact of differential section atten- 
dance, a measure which could be a proxy 
for student attributes such as study disci- 
pline, interest in the course, and so forth. 

Homework assignments were most bene- 
ficial to the relatively slow student as 
measured by ACT-SAT scores; they were 
of little significance for the bright student. 
This result illustrates one way in which 
TIPS could possibly increase the efficiency 
in the use of instructional inputs and, in 
this case, the student's time. Since bright 
students in the TIPS class received very 
few required assignments, TIPS was likely 
instrumental in increasing the productiv- 
ity of instructional resources. Not only did 
bright students "save" ten to fifteen hours 
per semester by not irking assignments 
of low productivity, but 7U's spent no 
time grading and recording the results. 
Instructional resources were instead shifted 
toward the low-achieving student, where 
the relative productivity of the homework- 
assignment technique appears to be high. 

The contribution of TIPS to student 
examination performance was greatest for 
the lelatively low-achieving student (19.5 
percent), falling to 13.3 percent for his 
brighter classmate. The impact of TIPS 
occurs not only through individualized 
homework assignments, but also through 



Table 1— Percentage Contribution of Each Factor to the Performance of 
Individual Students on the F'irst Midterm Exam 



Name of Student 



Pre- 
dieted 
Score 



Inter- 
cept 



diss MATH 



John Shtde 

(Slow Student TIPS class) 
Charles Kinbote 

(Average student TIPS class) 
Sybil Swallow 

(Bright Student TIPS class) 
Jack Gradus 

(Average Student Control Class) 



54.63 33 6% 5.0% 

58.36 31 2% 5.5% 

63.69 28.8% 5.1% 

49.51 37.1% 6.5% 



HSP 


ACT- 
SAT 


PCT- 
SECT 


Major COMASGN 


ASXDON'E tips + 
+A C> Inter- 
LogA actions* 


10.0% 


5.7% 


14,4% 


3 4% 


-0 0% 


7.6% 


19 5% 


11 0% 


16.9% 


13 3% 


3 1% 


-0.0% 


3 1% 


15 8% 


11.4% 


24.7% 


12 3% 


2 9% 


-0 0% 


1 $<" c 


13 3% 


13 0% 


20.1% 


15.9% 


3 7% 


-0 0% 


3 7% 





' Inunction. -TJPS ASSDONB+T1PS ASXDOXELo t (ACT)+TlPS Lot(ACT). 
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feedback and study discipline. All of these 
influences are plausibly more beneficial to 
the low achiever. Moreover, given the ex- 
perimental design, it is not surprising that 
the low-achieving student received greater 
benefits from TIPS. A larger share of in- 
structional resources (grading and TA's 
time) was allocated to this student, even 
though the total resources employed in each 
class (including the student's time) was 
roughly the same. 

Are these measured impacts of TIPS 
specific to ihe "type" of examination ques- 
tion? The results in Table 2 suggest that 

Table 2 Contribution of TIPS to Student Score 
on the Components of the First Examination 





Multiple 


Short- 


First 


Student 


Choice 


Answer 


Essay 


John Shade 


16 2 r c 


29 47c 


16 5% 


(low achiever) 








Charles Kinbot^ 


13 3<T* C 


27.07c 


9.77c 


(average achiver) 








Sybil Sw allow 


10.87c 


24 r* c 


6 6 r c 



(high achiever) 



the greatest impact of TIPS was revealed 
on short-answer, applied problem-type 
questions, moreover, the low-achieving 
student gained as much or the essay ques- 
tion as he did on the multiple-choice ques- 
tions. Finally, the distributional impact of 
TIPS appears consistently across all types 
of questions, although it is most pro- 
nounced on the essay. 

Other Effects of TIPS 

1. Student Evaluation of TIPS 

Students responded favorably to the 
employment of TIPS. They possessed no 
significant hostility to the use of data pro- 
cessing equipment Moreover, they urged 
that TIPS be used in future economics 
classes and in other disciplines as well. The 
student's evaluation of TIPS is largely 
invariant to his class, major, or ACTS AT 
standing (Kelley, 1968). 



2. Student Evaluation of the Course 
and the Professor 

The student's evaluation of the cour&e 
and the professor was uninfluenced by 
TIPS. The end-of-course evaluations (pre- 
pared by the Department of Economics 
and by the Wisconsin Student Union) 
yielded virtually identical results in the 
control and the experimental groups. This 
evidence is consistent with the hypothesis 
that the Hawthorne effect was unimpor- 
tant. 

3. The Lasting Ejects 

Approximately 250 students were re- 
tested one year after the completion of the 
experiment. While the results are not yet 
completely analyzed, preliminary findings 
reveal that the differential TIPS effect is 
maintained over time, although it dimin- 
ishes somewhat in magnitude. This impact 
of TIPS on the retention of knowledge is 
probably attributed to the change in study 
habits engendered by the teaching ap- 
proach. Students in the TIPS class have 
been shown to stuoy and review continu- 
ously throughout the semester, allocating 
a relatively smaller share of their time to 
preparing for major examinations (Kelley, 
1968, pp. 451-52). This contrasts with the 
"typical" study pattern of allocating a 
greater proportion of time to examination 
preparation, i.e., cramming. The latter 
study pattern has been shown by psycholo- 
gists to represent a relatively unproductive 
format if knowledge retention is the cri- 
terion of evaluation. 

The most interesting finding on the 
retention of knowledge is that the distri- 
butional effect of TIPS largely disappears. 
If this result stands up to further analysis, 
then clearly the "efficiency" assessments 
made above regarding the relative produc- 
tivity of allocating a disproportionate 
share of instructional resources to the 
lower-achieving student could well be 
modified, and even overturned. 



9 

-ERIC 



1233] 



237 



ECONOMIC EDUCATION 



4. The Number of Majors 

The proportion of the TIPS class select- 
ing economics as a major, as measured tv/o 
years later, was 23 percent higher than 
that in the control class. This unexpected 
result is somewhat difficult to interpret. 
Recall that students appeared to obtain 
no differential "enjoyment" of the course 
or instructor due to TIPS. Possibly their 
greater academic success in the course, by 
comparison with their evaluation of it, is 
tb *r more impoitant factor in their selec- 
tion of a major. 1 

Cost of TIPS 

Costs can be divided into six categories: 
1) physical facilities, 2) data processing, 
3) faculty time, 4) student time, 5) TA 
time, and 6) other (secretarial, administra- 
tive, printing, and so forth). A detailed 
examination of the differential total costs 
reveals that there is no significant differ- 
ence between the per student cost in the 
TIPS and the control lectures. This some- 
what surprising result is obtained from the 
fact that the increased direct costs of the 
system (computer time, professor's time in 
survey preparation, printing) is largely 
offset by the more efficient use of existing 
classroom resources {TA grading time). If 
an evaluation of the student's time 
"saved" or released by TIPS is taken into 
account, then TIPS, as implemented dur- 
ing the experiment, would have econo- 
mized on total instructional resources. 

Research to date has not yet determined 
the effects of TIPS on the distribution of 
costs. The major distributional impact 
occurs in f he allocation of TA and student 
time. To the extent that TIPS is not used 
for enrichment purposes, then students of 

1 The same percentage distribution of A's, B's, and 
C's was awarded to each class. A difference in letter 
grades in this range did not therefore account for the 
larger number of majors from the TIPS class. This 
grading procedure was considered necessary to ensure 
student cooperation during the experiment. 



lower achievement are, on the one hand, 
incurring greater time costs and, on the 
other hand, receiving a disproportionate 
share of the benefits and instructional 
resources. 

TIPS and Economic Efficiency 
It is possible to form a preliminary 
appraisal cf the efficiency of TIPS as used 
in this particular experiment. TIPS pro- 
duces increased output for most students 
although, as implemented, more output 
was distributed to the relatively low- 
achieving student. Assuming a positive 
value of marginal output, then the sign of 
the total value of output is positive and is 
uninfluenced by the distributional effects 
of this instructional technique. However, 
TIPS's distributional impact influences 
the size of the value of total benefits. 

While the total cost of TIPS is approxi- 
mately the same as in the traditional class- 
room format, a higher cost was assumed 
by the low-achieving student. Assuming 
that the opportunity cost of the time of 
this student is less than or equal to that of 
his brighter counterpart, I can conclude 
that TIPS is a more efficient technique 
than the traditional classroom framework. 

Several qualifications are in order. First, 
these conclusions are based largely on the 
course examination measures. Other mea- 
sures, including output-added, measures of 
intellectual curiosity, or critical thinking, 
may yield quite different findings. Second, 
the value of the output depends on who is 
doing the valuing. While faculty may be 
inclined to value strongly the impact of 
TIPS on retained student achievement of 
economics, students, in contrast, plausibly 
place a relatively high weight on course 
"enjoyment," somehow measured. (We 
have concluded that course "enjoyment" 
is largely invariant to TIPS use.) Alterna- 
tively, even the most enlightened depart- 
mental chairman, while responsive to 
achievement and course evaluations, will 
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place some positive (relatively high?) 
weight on the "economics-majors" output. 
Finally, my results apply to one experi- 
ment, with one instructor, in a single uni- 
versity, and in a particular course. Even 
if we assume that the experiment is meth- 
odologically sound, and that analytically 
sensible theoretical and statistical models 
were employed, the ability to generalize 
from this single experiment is limited. We 
would be interested in replicating this ex- 
periment in other courses, disciplines, and 
environments. Moreover, these experi- 
ments should ideally be outside the direct 
influence of the researcher. 

A final qualification relates to the pre- 
dicted outcome of replicating TIPS in 
other settings. A wide variation of TIPS 
impacts is likely to be identified. The irr- 
pact of TIPS is in large part a reflection 
of the relative success with which the pro- 
fessor correctly selects the appropriate 
teaching instruments and test items. Given 
the paucity of scientifically based findings 
on the relative productivity of alternative 
teaching approaches and materials, the 
full potential of a 77PS-like approach to 



instruction will not be revealed until major 
advances are made in the more fandamen- 
tal instructional areas of testing, diagnosis, 
and prescription. If the past can be taken 
as a rough guide to the future, notable 
advances in these areas of instruction, in 
economics or any other discipline, are not 
likely to be just around che corner. 
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A Training System for Graduate Student 
Instructors of Introductory Economics 
at the University of Minnesota 

Darrell R. Lewis and Charles C. Orvis 



College and university leaching is . . the only profession (except the proverbially 
oldest in the world) for which no training is given or required. 

—Jacques Barzun, 1968. 

This paper reports on an evaluation of the effectiveness of a three-part training sys- 
tem developed at the University of Minnesota for assisting graduate student instructors 
(GS!) of the introductory course in economics. 

Historically, the University of Minnesota (along with most other large universities) 
has made extensive use of graduate students in the teaching of undergraduate economics 
There are no prerequisites for teaching the principles course other than being enrolled 
as a second year graduate student in economics and being eligible for financial aid. 
Until three years ago, the department simply provided a syllabus, a textbook with in- 
structors manual and section assignments with room numbers, and turned the graduate 
student instructors 'oose in the classroom. 

However, during the past three years an integrated series of student evaluations, 
videotaped classroom observations and instructional seminars have been developed for 
training and assisting the 22 graduate students who are providing the instruction for 44 
sections of the introductory course at the University of Minnesota. 

In the fall of 1970, in response to a request from the Graduate Economics Club as 
well as the Economics Department's own desire to improve instruction, a series of de- 
partmental seminars on the teaching of economics for all new GSIs was initiated. Simul- 
taneously, the possibility of videotaping individual instructors was announced by the 
University's Radio and Television Department. Since this seemed to offer unexplored 
possibilities for the improvement of instruction, the project was integrated into the 
seminar concept. Approximately 20 GSIs and senior faculty were videotaped, critiqued 
and reviewed by selected members of the department. Some of the tapes were also used 
as part of the seminar— to show others the benefits of videotaping and to demonstrate 
. various teaching techniques. A student evaluation questionnaire (with specific instructor 
performance criteria identified) was also administered to each GSTs class to provide 
additional feedback to each instructor as to his performance. 

Throughout the first year both participation and feedback from the GSIs had been 
excellent. However, a basic question remained: Were we having any measurable impact 
on both student and instructor performances in the classroom? To resolve this question, 

Darrell R. Lewis is Professor of Economic Education and Charles C. Orvis is a Research Fel- 
low at the University of Minnesota. The authors wish \o acknowledge the invaluable assistance in 
the developmental stages of this project by John Scarbrough and Ray Riezman, Research Fellows 
at the University of Minnesota. 
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the following study was conducted. 1 



Experimental Design 

During the 1971 fall quarter, all students enrolled in Economics 1-001 (Principles 
of Economics— Macrotconomics) were selected as a control population This total popu- 
lation was divided into fourteen sections which met three times each week as a section 
and once a week for a mass lecture. Enrollment in each of the sections was essentially 
a self-selection process on a first-come, first-serve basis. Students were not aware of 
which staff member would be assigned to any of the sections being offered between 8:00 
a.m. and 4.00 p.m., Monday through Friday. The average size of the sections was 
The mass 'ectu^e was handled by senior faculty while the fourteen sections were conduct- 
ed by seven graduate student instructors (GSI), each teaching two sections. 

During the fall quarter, tht seven GSIs were precluded from participating in or 
having knowledge about the videotaping or seminar. Similarly, these seven instructors 
for the control groups and their students were unaware of both the experimental design 
and the hypotheses being tested. However, all of the Economics I -001 students in the fall 
term responded to questionnaires dealing with student characteristics and were pre- and 
posttested on the Test of Understanding in College Economics (Part I, Forms A and B). 
Postcourse student evaluations of each instructor's performance were also collected on 
the Purdue Rating Scale for College Instructors. 2 

The Purdue Rating Scale for College Instructors is a recently developed semantic 
differential questionnaire with 27 items. Each question associates with one of five 
factors representing the instructor's (I) personal characteristics. (2) objectivity, (2) expo- 
sition. (4) tests and grades, and (5) subject matter knowledge. Similarly, each question 
is posed in such a fashion as to give description for appropriate corrective action— i.e. ♦ 
each is expressed in performance criterion terms. Reliability tests and appropriate vali- 
dation for the instrument have been produced at Purdue University. 

In i ! er to control for the experimental training of instructors, the same seven GSIs 
were use as the experimental group during the winter quarter when Economics 1-001 
was again offered. The experimental group of 438 winter quarter students was again 
divided into fourteen sections with an average section sue of 31. As with the control stu- 
dents, all the winter quarter experimental students responded to the questionnaire, the 
Purdue Rating Scale, and the TUCE. Subsequent tests cn selected student characteristics 
and pre-TUCE scores revealed no significant differences between the control and experi- 
mental groups (see Table I and study results below). All sections and instructors in both 
the fall and winter quarters used the same instructional materials, senior faculty for the 
mass lectures, and departmental course syllabus. 

The experiment was designed in such a way that the seven instructors were randomly 
selected from a total of 22 GSIs in the fall of 1971. The seven instructors were then given 
only a syllabus and section assignments and were not provided with any other assistance 
or training— i.e., the norm for most GSIs at large universities. However, during the win- 
ter quarter these same seven GSIs were systematically exposed to the department's three- 
part training system. 3 

The GSI Training System 4 

Each week throughout the winter quarter the seven experimental instructors met 
together in an informal seminar with the authors of this study and another senior faculty 
member from the economics department. Such topics as the purpose and scope of intro- 
ductory economics, student-teacher interaction and discussion techniques, teaching 
techniques for various concepts, integration of supplemental readings and lectures with 
the syllabus anc' text, orientation and familiarization ^ Vlt b the literature on teaching at 
the college level, introduction to the economic education literature at the collegiate level, 
how to plan and establish learning objectives for each class or unit, and how to construct 
tests and measure student performance made up the content of the seminar. 



[237] 





As a second component of the GSI training system, each instructor was videotaped 
three times during the quarter. Following each 45-minute videotaped class session, ap- 
proximately two hours were spent reviewing and critiquing each tape with the individual 
instructor/ 

In conjunction witn the videotaping, two instruments were developed to assist both 
the instructor and the reviewing procedure. 6 Prior to each class designated for video- 
taping, the instructor completed a questionnaire directed to the objectives, content and 
techniques expected to be covered during the class period. This information was sub 
sequently reviewed and compared Aith the videotape duiing the critiquing session. 

The second instrument was constructed so as to measure actual instructor per- 
formance from the videotape. Prior to the review session with each instructor, a specially 
trained graduate student in economics previewed and coded each tape at twenty-second 
intervals according to a specially adapted observation scheme which measured (a) the 
method employed (lecture, question/problems, discussion, other), (b) the learning ob- 
jectives (knowledge of facts, theoretical concepts, exposition on theory, simple applica- 
tion, complex application) and (c) the verbal and nonverbal expressions (supportive, 
receptive, neutral, unresponsive, disapproval). The data were then summarized and pre- 
sented to the GSI during the review session. 7 

The data from this latter instrument have proven valuable in at least two regards. 
Instructors, like students, respond to those things which are being measured. Secondly, 
the instrumentation was able measurably to confirm or reject those things the instructor 
said (thought) he was doing in his classroom. It also reinforced the reviewer's intuitive 
critique and comments. 

The third facet of the training system involved student evaluations of the instructors' 
performance. As discussed above, the Purdue Rating Scale for College Instructors was 
given to all students in each of the seven GSls fall classes. The results were then dis- 
cussed during the winter quarter videotaping review sessions with each of the instructors 
Suggestions and strategies for improvement for each low-rated item were then developed. 
This instrument and this procedure facilitated not only t'.ie training of GSls, but provided 
for built-in evaluative comparisons with the subsequent winter quarter instructor ratings. 

Description of Experimental Results 

As Table 1 indicates, the winter quarter experimental group did not differ signifi- 
cantly from the fall quarter control group in any of five matching variables— i.e.. Sex, 
Age, Cumulative Grade Point Average, ACT Score, and Pre-TUCE— at the two-tailed 
.05 criterion level being employed in this study. Consequently, with the same instructors 
teaching in both the fall and winter quarters, the groups were considered adequately 
matched for the purposes of this study. 

The Pre-TUCE data in Table I also indicate that the Minnesota scores for both the 
experimental and control groups approximate the national norm of 13.24 at the outset 
of each quarter term. Post-TUCE scores for the fall quarter control group also approxi- 
mate the national norm of 19.08, further indicating normality for the control sec- 
tions (8). 

Impact of the Training System on Student Learning 

As Table I indicates, the winter quarter experimental students clearly outperformed 
uic .ontrol students in economic understanding. Not only were the differences between 
group Post-TUCE scores significant, but the Change-in-TUCE scores (Post-TUCE 
minus Pre-TUCE) also indicated significant differences. The experimental group ex- 
hibited a 5* percent gain over their Pre-TUCE score while the control group experienced 
only a 43 percent gain in output added. 8 The gains for the control group are comparable 
to the national norming data for the TUCE wherein students from four-year colleges 
showed average gains of 40.3 percent. The experimental group's performances were 
clearly superior. 
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Tabtot 

Description of Student Characteristics, Parformancas and Evaluations: 
Fall and Winter Quarttra 



Variables 



Fall Quarter 
14 Sections 



Winter Quarter 
TV « 438 
14 Sections 



Kest 

Comparing Mean* 



Sex (0, 1 ) 
Male = 1 


Means 

.71 


.45 


Means 
.70 


.46 


.30 


Age 07-38) 


20.49 


3.09 


20.80 


2.74 


1.43 


Grade point average 
(0-4) 


2.73 


.51 


2.77 


.48 


1.09 


ACT score (0-36) 


24.88 


3.18 


24.46 


3.63 


1.50 


Pre-TUCE(0-33) 


13.52 


3.70 


13.04 


3.96 


1.71 


Post-TUCE(0-33) 


19.46 


4.70 


20.11 


4.53 


1.97f 


Cnange-in-l utt 


^ 0/4 




1 AjI 


4.67 


3.36ft 


Average instructor rating) 
(1-6), I = very low 


4.11 


.84 


4.46 


.78 


5.68ft 


Rating scale subparts: 
1. Personal evaluation 
(1-6), 1 * very low 


4.41 


.90 


4.76 


.85 


5.42ft 


2. Objectivity evaluation 
(1-6), 1 = very low 


4.32 


.91 


4.70 


.86 


5.82ft 


3. Exposition evaluation 
(1-6), 1 = very low 


3.82 


1.10 


4.32 


.99 


6.45ft 


4. Testing evaluation 
(1-6). 1 » very low 


3.89 


.95 


4.10 


.92 


3.05ft 


5. Knowledge evaluation 
(1-6), 1 * very low 


4.26 


.93 


4.52 


.81 


4.02ft 



t Significant at the .05 level. 
1 1 Significant at the .01 level. 

Although the significance of the training system's impact on student learning is 
clearly evident from the above-given data and discussion, a more controlled analysis 
can be performed by fitting the student descriptors, evaluations and test results to a 
multiple linear regression model. 

The equation for the model takes the familiar linear form wherein Y = a + b\\\ + 
biXi + b}X} + . . . + 67X7 + e. Post-TUCE, the dependent variable, is assumed to be 
linearly related to the following independent piedictor variables: 

X, * Pre-TUCE (0-33), continuous 

X2 = ACT score (0 - 36), continuous 

Xj « Cumulative grade point average (0 - 4), continuous 

X 4 « Age (17 - 38), continuous 
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X< = Sex (I = male, 0 = female), continuous 
X 6 = Instructor evaluation (1 - 6; 1 = very low), continuous 
X? = Class type (I = experimental in winter quarter, 0 = control), dichoto- 
mous 

Five of the variables (Pre-TUCE, ACT, GPA, Age and Sex) were chosen on the 
basis of past research, generalizability and their qualities as educational realities which 
teachers of the introductory course must accept (and can measure) when trying to influ- 
ence student learning. 9 The last two variables included in the model (Instructor evalua- 
tions and Class type) were unique to the situation studied. 

This type of class attended (experimental or control) was the key variable in this 
study. In the multiple linear regression model of this study, the coefficient of this variable 
measures the residual contribution of the GSI training system to student achievement in 
introductory economics. 

A stepwise regression procedure was employed which entered each variable into the 
equation in order of significance. The data, as fitted to the multiple linear regression 
model described above, are presented in Table 2. 

Table 2 

Rtgreulon Results from GSI Training Program 

(f-statlstlc In parenthesis) 

Dependent Variable: Post-TUCE 
N = 761 Students 

Independent Variables (323 Control and 438 Experimental) 



X, Pre-TUCE (0-33) 

X 2 ACT (0-36) 

X, GPA (0-4) 

X 4 Age (17-38) 

X 5 Sex (0-1; 1 = male) 

X* Instructorevaluation (1-6; 


.34(7.44)tt 
36 (6 8 1 )tt 
.02 (4.70)tt 
19 (3 25)tt 
1 43 (3 88)tt 
43(2 I6)t 


1 = Very Low) 


71 (2 I2)t 


X? Class type (0-1, 1 = experimental 


in winter quarter) 
Constant 


-5 24 
3.88 
30 


Standard error 


Adjusted R 2 


t Significant at .05 level 




ft Significant at .01 level 





As Table 2 indicates, when the dau were fitted to the multiple linear regression 
model, the significance of the earlier /-statistics was confirmed. Controlling lor prior 
knowledge in economics (X,), mental ability and achievement (X:, X,), maturation (X 4 ), 
sex (X«) and student evaluations of the instructor (X»), the type of class (Xi) with experi- 
mental involvement in the project did have a significant association with the students' 
Post-TUCE scores. 10 . The model predicts that a student attending a class which was in- 
volved with the GSI training system would, on the average, score almost three-quarters 
of one point (.7 1 ) more than nonpartic : pants on their Post- TUCE scores. 

The regression model also indicates that the six other v.-.riables significantly asso- 
ciate with student achievement in economic understanding. Prior knowledge in econom- 
ics (X,) mental ability and achievement (X 2 , X,), maturation (X 4 ), sex (X.) and student 
evaluations of the instructor (X.) were all found to be significant." These findings are 
all consistent with the results of other research in this field [6], 

0£ 244 



Impact of the Training System on Instructor Performances 

The data in Tables I and 2 also confirm that the GSI training system had measur- 
able and significant influences on the instructors 1 actual performances as measured by 
student evaluations. Not only was there a significant difference between quarters in the 
total Rating Scale for all of the instructors, but each of the subparts to the Rating Scale 
were also significantly different between the experimental and control groups. In turn, 
these significant changes in GSI performances associated significantly with student learn- 
ing as confirmed by the Instructor evaluation variable (X&) in the regression model of 
Table 2 with Post-TUCE. 

It is important to note that throughout the experimental quarter's videotaping re- 
view sessions each instructor was presented with the Rating Scale (student evaluations) 
results from the previous course. Suggestions and strategies for improvement were then 
developed with each instructor for each low rated item. The instrument and these pro- 
cedures were apparently effective. 12 

Individual instructor ratings are summarized in Table 3 for each of the two groups 
with rf§^ct to the Rating Scale and its subparts. With one major exception (Instructor 
IV), all thkGSIs increased their rating scores for almost all the subparts to the Rating 
Scale. It is important to note that the only instructor whose ratings dropped (Instructor 
IV) developed mononucleosis during the experimental quarter and was the least active 
and enthusiastic"p£rticipant in the training system. This illness and behavior undoubtediy 
carried over into his teaching performance as well. In testing, for example, he simply 
pulled old exams from hb files. It is also interesting to note that Instructor V was an 
office mate with Instructor IV and used the same tests a: did Instructor IV. Consequent- 
ly, both instructors went down in their student ratings dealing with "tests and grades." 
Both the students and the Rating Scale instrument are apparently sensitive to such be- 
havior and circumstances. 

The student evaluations, as revealed by the Rating Scale, were also substantiated 
in early videotape reviews during the experimental quarter. Both the reviewer in his 
observations and the actual videotape coding procedure revealed the same strengths and 
weaknesses as the student evaluations of GSI performances. High instructor ratings on 
"Personal characteristics 0 and "Exposition" skills were supported by high coding fre- 
quencies on "Supportive" and "Receptive" categories of verbal and nonverbal expres- 
sions; high instructor ratings on "Subject matter knowledge" were supported by high 
coding frequencies on teaching methods other than "Lecture" and on higher level learn- 
ing objectives such as "Complex applications." The consistencies between these two 
instruments, along with the actual videotaped observations, were persuasive evidence 
in getting the GSls to change their teaching behavior. 

Summary 

This study has confirmed that a systematic teacher training program involving 
Graduate Student Instructors of introductory economics with an integrated series of 
student evaluations, videotaped observations and instructional seminars can have a sig- 
nificant and measurable impact on both student and instructor performances in the class- 
room. Specifically, it was found that as a result of the training system (a) student per- 
formance, as measured by the TUCE, and (b) instructor ratings, as measured by the Pur- 
due Rating Scale for College Instructors, both increased significantly. It was also found 
that instructor ratings, as measured by student evaluations on the Rating Scale, associate 
highly with student performances on the TUCE. 

The experimental efforts and results of this study suggest that other institutions 
and departments of economics can and should undertake greater responsibilities for 
providing their graduate student instructors with teacher training. The specific com- 
ponents of the Minnesota training system are not costly in either set-up terms or in 
maintenance. Most of the developmental costs have dready been assumed in the crea- 




TabteS 

Student Rating* of Inttnictors 



vJ NU^I VW|r3 




Ran** 


Average: Ail 

IiMlrnrtnrt 

lIDilUVIUlS 


1 


|| 


Individual Instructors 
111 IV V 


VI 


Personal characteristics 


.all 


3.90-4.97 


4.39 


3.90 


4.97 


4.09 


4,89 


4.65 


4.18 


(Questions 2-3-6-17-21-25) 


Wtr. 


4.33-5.41 


4.74 


4.33 


5.41 


4.55 


4.54 


4.78 


4.69 


Chance from fall to winter 

V*M(**1£V ll%/<*< lull IW TT II 11 VI 






.35 


.43 


.44 


.46 


(.35) 


.13 


.51 


Objectivity 


Fall 


3.90-4.91 


4.30 


3.98 


4.91 


4.19 


4.59 


4.60 


3.95 


(Questions 7-10- 19-23) 


Wtr. 


4.26-5.26 


4.68 


4.46 


5.26 


4.56 


4.26 


4.80 


4.59 


Change from fall to winter 






.38 


.48 


.35 


.37 


(.33) 


.20 


.64 


Exposition 


Fall 


2.86-4.59 


3.78 


3.47 


4.59 


3.70 


4.49 


4.12 


3.20 


(Questions 11-13-14-15-22) 


Wtr. 


3.82-5.14 


4.30 


3.82 


5.14 


4.16 


4.16 


4.37 


4.17 


v^nangc irom tan io winicr 






.Jit 




.55 


.46 


(.33) 


.25 


.97 


Tests ^nd grades 


Fall 


3.18-4.50 


3.87 


3.34 


4.32 


3.63 


4.50 


4.25 


3.84 


(Questions 1-4-9-16-20-27) 


Wtr. 


3.52-4.62 


4.08 


3.52 


4.62 


4.28 


3.97 


4.12 


4.10 


Change from fall to winter 






.21 


.18 


.30 


.65 


(.53) 


(.13) 


.26 


Subject matter knowledge 


Fall 


3.28-4.89 


4.24 


4.23 


4.69 


4.15 


4.89 


4.32 


4.09 


(Questions 5-8-1 2-18-24-26) 


Wtr. 


4.17-4.98 


4.52 


4.55 


4.98 


4.42 


4.44 


4.46 


4.62 


Change from fall to winter 






.28 


.32 


.29 


.27 


(.45) 


.14 


.53 


Overall 


Fall 


3.45-4.70 


4.11 


3.78 


4.70 


3.95 


4.67 


4.39 


3.85 


(Averaging above subgroups) 


Wtr. 


4.14-5.08 


4.46 


4.14 


5.08 


4.39 


4.27 


4.51 


4.44 


Change from fall to winter 






.35 


36 


.38 


.44 


(.40) 


.12 


.59 



VII- 
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tion of the instruments for evaluation and codification. The fixed costs for the video- 
*ape equipment totals only $2,400. Only three re^l departmental resource inputs are 
needed for maintaining and monitoring the system: (1) the senior faculty member mes f - 
ing with the seminar for approximately 10-15 hours each quarter, (2) an undergraduate 
student with a quarter-time appointment for setting up, recording, and taking down the 
videotape <* ipment for each class session recorded, and (3) an outstanding graduate 
student instructor with a half-time appointment in economics for previewing, coding 
and critiquing each class session tape and for handling the logistics of the seminar, stu- 
dent evaluations and videotaping procedures. Each of the two students can be trained for 
the system in less than five hours. With these resources, experience at Minnesota indi- 
cates that approximately 7-10 GSIs can be processed through the entire system within 
one quarter — i.e., the seminar, three videotaping episodes for each instructor and stu- 
dent evaluations. The Department of Economics at Minnesota has found the results in 
student and GSI performances and GSI satisfaction to be worth the costs in time, effort 
and dollars. 



'The study described in this paper was conducted experimentally during 1971-72 and was fully 
implemented during 1972-73. 

2 Copics of the Purdue Rating Scale for College Instructors as well as the survey questionnaire will 
be sent by the authors on receipt of a self-addressed envelope with sufficient postage for 2 ounces. 
Although most of the GS!s in this experimental study had leaching experience prior to their 
participation in the project, a weakness in the design of the study is the possibility lb „iy superior 
performance in the winter quarter may be attributed to their maturation and/or a . .oi.al experi- 
ence However, similar data on student and GSI performances from an earlier study wur 5 the intro- 
ductory course at the University of Minnesota indicate that the additional experience on only one 
term is not c ignificant [5] in fact, the obverse was true in the current study The GS! who had 
the most previous teaching experience was the GS! in the fall term with the lowest student ard 
instructor performances and subsequently showed the most improvement as a result of the training 
system On the other hand, the two GSIs with the least leaching experience were among the lop 
three in student and instructor performances during the fall term. 

4 A number of excellent discussions concerning the reliability, validity and usefulness of many uflhe 
procedures and components included in i v , Minnesota GSI Training System can be found in re- 
cent publications by Eble [2,3,4], Costin, et at. [I], and Nowhs, et al [7] 

*As additional incentive for soliciting GS! participation in the project, the "best" instructors were 
encouraged to retain the videotape of their best performance (usually their th , and include this 
as a part of their vitae for subsequent employment. Additionally, each of the puilicipaling instruc- 
tors was ^jured of anonymity throughout the entire training and valuation proceedings Both of 
these practices have continue 1 in the subsequent training of GSIs al the University of Minnesota. 
^Copies of both instruments will be sent by the authors on request 

"An additional "Summary Checklist" for the videotape reviewing was also constructed and will be 
sent by the authors on request. 

V\ny discussion of output added on the TUCE must be qualified with the recognition that the out 
put-added function is clearly nonlinear, there are easy questions, questions of "medium" difficulty, 
and some wh.,ii are very difficult In fact, the lest was designed this way in terms of cognitive 
composition. It is therefore somewhat inappropriate to compare increments on this test, as con- 
structed. For example, at the extreme a student moving his total score from 3 to 6 on the TUCE 
has picked up much less economics than a student moving from 28 to 31. Only on a truly "linear 
lest" can these types ci com^-isons be safely made. 

*A number cf other pcssible independent variables were considered for in r, usion However, such 
other variables as ma h background, major, and family background were 1, und to be nonsignifi- 
cant in other simila studies and/or intcrcorrelated with those identified in this study Moreover, 
for policy pu. p j ^ only those independent variables which vere identifiable prior to the wourse 
were included. 

1 0 A 11 variables in this model were found to have intercorrelations of 21 or less in the conclation 
matrix except ACT and Pre-TUCE. They had a coi relation of .31, a degree of intcrcorrcla,ion hut 
not detrimental to the model s analysis since they were both significantly correlated .vith Post- 



Footnotes 



TUCE. 



ERIC 
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"The possibility of significant interaction terms (between X* and X7, X: and X7, Xs and X7, X 2 and 
X4) was also examined in subsequent regression models. The results were essentially the same as 
those found in Table 2. No significance was found in any of the added variables. 

"The reliability, validity and usefulness of student ratings of college teaching are also persuasively 
presented in an excellent review article by Costin, Grccnough and Menges [I]. 
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ECONOMIC EDUCATION: ASPIRATIONS 
AND ACHIEVEMENTS 

By G. L. Bach and Phillip Saundero* 

In recent years, widespread concern has been expressed over the low 
level of economic literacy of the general public. Since apparently only 
a small portion (perhaps 10-20 per cent) of all future citizen-voters 
will ever take as much as one economics course in college/ attention 
has been increasingly focused on the possibility of developing at least a 
modicum of economic understanding through teaching in the high 
schools. 

Concern for this problem has developed gradually within the eco- 
nomics profession, culminating in the policy statement adopted by the 
AEA's Executive Committee in March, 1964 and the establishment of 
a new action-oriented Committee on Economic Education [9, p. 
565]. Prior to this action the Association's previous Committee on Eco- 
nomic Education had for some years attempted to stimulate economists 
to help high school teachers and administrators with this task, mainly 
through attention to the problem at annual AEA meetings. During the 
past five years, however, a more concerted effort has developed, includ- 
ing at least four major projects, two of which have involved AEA par- 
ticipation or sponsorship. 

This article summarizes briefly these recent developments and then 
presents the findings of recent investigations of what economics is ac- 
tually being taught in the high schools and by whom, and of the effec- 

*The authors are, respectively, professor and assistant professor of economics at Car- 
negie Institute of Technology. The former is chairman of the AEA Committee on Educa- 
tion, and this article represents, in part, a report to the profession on the results of 
several special undertakings the Association has sponsored or encouraged Hie authors 
wish to thank especially their colleagues, Michael Lovell, Lester Lave, and Leonard 
Rapping, for statistical advice at a number of p<*nts; and Mrs. Ann Brunswick of the 
National Opinion Research Center who supervised the NORC study on which a sub- 
stantial portion of this report is based NORC has a separate detailed report on its find- 
ings which includes extensive information on high school teachers and their backgrounds 
not reported here [2]. 

* Less than 50 per cent of all present high school students will enter college, and ap- 
parently only about one-fourth, certainly less than one-half, of these will take a course in 
economics. 



^ q SOURCE: American Economic Review, vol. 55, no. 3, June pp. 329-356. Printed with permission of the 
HJC aut * K>I * **** V^btid**- 
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tiveness of recent AEA-sponsored steps to improve economic under- 
standing. The major sections are: I. Recent Steps to Improve Econom- 
ic Understanding; II* What Economics Do Students and Teachers 
Know?; IIL Current. Materials nnd Teaching Practices; IV. The 
Teachers; V. Who Watched "Tt* American Economy"?; VI. Effec- 
tiveness of "The American Economy"; and VII. Some Implications. 
For those who want a quick overview, Tables 1 and 6, plus the con- 
cluding "Implications," may be helpful. 

I. Rt^etti Steps to Improve Economic Understanding 

In 19S9 the AEA Committee on Economic Education appointed 
a 13-man Textbook Study Committee to analyze the economic content of 
the textbooks being used in high school social studies courses (in eco- 
nomics, problems of democracy, and American history). Paul Olson 
served as the chairman. This committee reported to the profession in 
"Economics in the Schools," published as a supplement to the Ameri- 
can Economic Review in 1963 [11]. 

In 1960 the Association appointed a National Task Force on Eco- 
nomic Education, an independent group of well-known economists, to 
describe for high school administrators and teachers a minimum core 
of economic understanding fundamental to good citizenship and rea- 
sonably attainable by most high school students. The need for such a 
statement from the profession had been widely voiced by school teach- 
ers and administrators, school boards, and leading citizens. The report 
of the Task Force, Economic Education in the Schools, was published 
in 1961 [12]. Some 250,000 copies of this report have been distrib- 
uted in full or in summary form to laymen and to secondary school ad- 
ministrators and teachers throughout the United States. 3 

In 1961 the AEA agreed to serve as co-sponsor of a new year-long 
national television course on economics, called "The American Econo- 
my," and offered in "The College of the Air" series carried by 182 
CBS stations and virtually all the educational television stations in the 
United States during 1962-63 (and rebroadcast by many in 1963-64). 

•Members of the Task Force were G. L. Bach, Chairman, Lester Chandler, R. A. 
Gordon, Ben Lewis, and Paul Samuelson irom the profession; Arno Bellack and M. L. 
Frankel from secondary education; and Floyd Bond, Executive Secretary. The rt^rt is 
available from the Committee for Economic Development, 711 Fifth Ave., New York 22, 
N.Y., which provided funds to finance the Task Force's work and published the report, 
with no control over, or responsibility for, the conclusions reached The CED also played 
& leading role in stimulation action on "The American Economy" and the "Test of 
Economic Understanding" described below, and in securing funds to finance these projects, 
but in each case with no control over the project itself or the findings of the economists 
jjid other professionals involved The CED's policy statement, Economic Literacy for 
Americans [13], provides a summary of its role, and its own position on some of the 
substantive issues involved 
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Professor John R. Coleman of Carnegie Institute of Technology was 
the national teacher; some 40 leading economists participated as guest 
lecturers; and the members of the National Task Force, at the request 
of the AEA, served as a policy and advisory committee in planning the 
series. The course was aimed to help the general public, students, and 
especially school teachers with inadequate backgrounds in economics. 
It was based substantially on the report of the National Task Force 
as to content and approach.* "The American Economy" had an 
average daily audience of over one million viewers (according to a 
standard Neilsen survey and reports from the participating education- 
al television stations), including some 15,000 high school teachers and 
about 5,200 students and teachers who registered for college credit at 
one of the 361 cooperating colleges and universities. The cost of the 
program was approximately $1.5 million, which was provided by The 
Ford Foundation and a large number of leading business firms; in ad- 
dition CBS contributed free air time. This course, and the AEA spon- 
sorship of it with the Joint Council on Economic Education and Learn- 
ing Resources Institute (the producer), followed the general patterns 
successfully used by the professional associations in physics, chemis- 
try, and mathematics in preceding years. 

Following completion of "The American Economy," the National 
Task Force and the Learning Resources Institute commissioned the 
National Opinion Research Center, affiliated with the University of 
Chicago, to conduct a major national study of what economics is being 
taught in the high schools and by whom, and to analyze the effective- 
ness of "The American Economy" in raising the level of economic un- 
derstanding among high school social studies teachers, one prime tar- 
get of the program. The study was outlined by the National Task 
Force and conducted by NORC on the basis of a national stratified 
cluster sample. A cross section of high schools in the United States was 
selected. Teachers in those schools were questioned and they took a 
special test on economic understanding. The 2,791 actual questionnaire 
responses were then weighted to represent 4,677 high school social 
studies teachers, out of an estimated total of approximately 60,000- 
65,000 such teachers in the nation. The teachers thus surveyed com- 
pleted detailed questionnaires on their backgrounds, their teaching prac- 
tices and attitudes, and the content of their courses. In addition, each 
teacher took a 25-question version of the nationally standardized "Test 
of Economic Understanding" described below. All questionnaires and 
tests were administered by NORC personnel. The results of this study 

•A course outline, including the names of participating economists, was reported in the 
AER [101 and is available in more detail in the special guide prepared for the 
course [4]. Professor Coleman has also presented his philosophy of the course in [3]. 
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provide the first detailed picture, based on a scientifically designed 
sample of all high school social studies teachers, of who the teachers of 
economics are in the high schools, what their backgrounds are, what 
they are now teaching, and how much they learned from "The Ameri- 
can Economy " 

Simultaneously, in 1962-64, a new standardized high school level 
"Test of Economic Understanding*' was developed by a related group 
of leading economists, educational psychologists, and high school edu- 
cators to help school administrators and teachers evaluate how much 
economics their students know and are learning, using a large national 
norm group for comparison. This step, financed by the CED through 
the Joint Council on Economic Education, but entirely under the con- 
trol of the experts involved, was an attempt to meet the widespread 
demand for such a test that would be professionally competent and 
free from the suspicion of bias often leveled against tests devised by 
representatives of business or other economic groups. 4 The test com- 
mittee, under the chairmanship of Dr. John Stalnaker, President of the 
National Merit Scholarship Corporation, spent approximately a year 
devising a "Test of Economic Understanding" for use with high school 
students at the 9th to 12th grade level, using the report of the National 
Task Force on Economic Education as a rough guide as to the con- 
cepts and areas to be tested. The test was designed for students with or 
without separate courses in economics, and dual forms were prepared 
so the test could be given to students on a before-and-after basis. Be- 
cause the test was designed for use by thousands, even hundreds of 
thousands, of students in widely varying schools and areas, the use of 
"objective" type questions was mandatory. All questions are multiple 
choice, with four alternative answers given to each. 

Questions cover the basic areas of micro- and macroeconomics, plus 
some questions on "applied" fields such as international economics and 
comparative systems, but they omit all technical detail beyond such 
simple concepts as supply and demand. There are a few factual ques- 
tions, but most are focused on the understanding of basic "concepts" 
and ability to handle "p^Me* 51 " or "application" situations. A few 
questions call for analysis of major policy issues (such as monetary 
and fiscal policy), and three require reading and interpreting graphs of 
economic time series. 6 Roughly, one-third of each test covers micro* 

•The economists on this committee were G. L. Bach (Carnegie Tech), E. 0. Edwards 
(Rice), J. A. Kershaw (Williams), Ben Lewis (Oberiin), and Lewis Wagner (University 
of Iowa). 

•Typical questions are: 
Factual: 

1. In large business corporations common stockholders generally do not : 
a. own the business > 
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economics, one-third macroeconomics, and one-third special topics and 
applications, such as the farm problem, international trade, compara- 
tive economic systems, and the like. Each question was pretested on 
thousands of students, many were extensively revised, and the test was 
carefully balanced for coverage, concepts involved, degree of difficulty, 
and kinds of understanding— all within the practical limits imposed 
by the need for a mass-testing instrument. The final test is extremely 
elementary. Every economist would probably devise a somewhat 
different one for the purpose; the constraints imposed by test experts 
and the realities of mass use in high schools were considerable. 

Beyond these major recent moves, culminating in the AEA policy 
statement in June, 1964 [9], many other groups have worked steadi- 
ly to inprove economics in the schools. The nonpartisan Joint Council 
on Economic Education has perhaps worked most closely with profes- 
sional economists, and the AEA has long nominated economists to be 
members of the JCEE Board of Trustees. Important though these 
efforts may have been, however, it is impossible to detail them here. 

b. receive a share of the profits 

c. vote for the board of directors 

d. manage the day-to-day business 

Concept Understanding: 

1. Under a private enterprise economy the function of competition is to: 

a. eliminate wasteful advertising 

b. eliminate interest and profits 

c. prevent large firms from driving small ones out of business 

d. force prices to the lowest level consistent with a reasonable profit 

2. A rise in the price of which product would be likely to increase the demand for 
butler? 

a. butter 

b. oleomargarine 

c. bread 

d. any of the above 
Problem Analysis: 

1. If, when there is full employment, the federal government increases its spending 
without increasing its tax revenues, generally : 

a. a serious depression will occur 

b. an increase in unemployment will occur 

c. the national debt will decrease 

d. inflation will occur 

2. In a basically private enterprise economy, which tax is likely to alter most the 
pattern of consumer choices among alternative products? 

a. a general sales tax 

b. a personal income tax 

c. an excise tax on particular products 

d. a tax on business profits 

Copies of the full questionnaire are available from Science Research Associates, 259 
East Erie Street, Chicago. Illinois, which handles distribution of the questionnaire. SRA 
also has complete information on norm groups, plus an item analysis of all questions 
included. 
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II. What Economics Do High School Students and Teachers Know? 

How much economics do high school students and their teachers 
know? And how does their economic understanding compare with that 
of other groups, such as college students? 

Table 1 provides a summary picture, using test scores on the very 
elementary "Test of Economic Understanding" described above. While 
this test provides only a very limited measure of economic understand- 

Table 1— Level op Economic Understanding ot High School Seniors and Others* 









Significance nf 




Test Scores 


Standard 


jjwcrciiLC i rum 




Deviation 


Preceding 








Groupk 


High school seniors? 




6.67 




No econ. courses («»4fl)l) 


24.2 




One econ. course (»»1S34) 


29.7 


8.19 


,001 


High school social studies teachers ^ 




8.48 




No college econ. courses (»«717) 


32.0 


N.S. 


1-2 college econ. courses (n—1859) 


32.8 


8.42 


3-4 college econ. courses (n«1132) 


35.5 


8.94 


.001 


5+ college econ. courses (««931) 


37.2 


8.48 


.001 


Watched "The American Economy" 3 








or more times a week (»» 110) 


41.2 


8.40 


.001 


College sophomores after econ. course? 


40.7 


4.20 




(»~167) 




Industrial employees and managers 








Foremen and 1st line supervisors 




8.61 




(»*319) 


34.2 


.001 


Middle management (»=»313) 


36.3 


8.13 


Staff: engineers, accountants, etc. 


36.6 


7.11 


N.S. 


(»-96) 


Top management (»-9) 


42.7 


8.4 


.001 



• All test scores are on the standardized 50-item, objective "Test of Economic Under- 
standing," except test scores for teachers are converted from a special 25-item form of the test. 

• Differences significant at .001 level where indicated. Differences marked N.S. were not 
significant at the .05 level. 

• Data from Science Research Associates on total national "norm group" for the "Test of 
Economic Understanding." Widely repersentative sample of about 6,500 students. 

d Data from NORC national stratified cluster sample of high school social studies teachers, 
described in text. Score adjusted to statistically comparable 50-item basis from 25-item test 
used by NORC. 

• Data from studies of Carnegie Tech and University of Nebraska students, reported by 
PhflHp Saunders [8] and by C. R. McConnell and J. R. Felton (6], respectively. Standard 
deviation is for Carnegie Tech students only. 

' Data on special group of "industrial employees and managers" in 14 large, national com- 
panies tested by SRA for comparison purposes. Not necessarily typical of all companies. 
Data on top management shown merely for benchmark purposes. 
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ing, and while a more discriminating regression analysis controlling for 
such related variables as previous courses in economics is presented in 
Section VI, the basic scores are interesting, even striking for some cat- 
egories. 

The top two lines show the mean scores of some 6,500 high school 
seniors, divided between those who had taken no course in economics 
and those who had completed a one-semester course. The range of 
scores was from 8 to 48 on the 50-item test. While the students were 
not a scientifically drawn sample of all high school seniors, they were 
chosen to form a representative national "norm group" for users of the 
"Test of Economic Understanding." They comprise a roughly repre- 
sentative cross section of high school seniors, from large and small 
schools and cities, different sections of the country, and varying in- 
come areas. 

As Table 1 indicates, even without a course in economics, high 
school seniors did roughly twice as well as the 12.S score one would ex- 
pect by chance, and taking a separate course in economics added 5.5 to 
the mean score of 24.2. This difference, which represents over a 20 per 
cent improvement, is statistically significant far above the .001 level, 
but it must be remembered that other variables (for example the pos- 
sibility that brighter students tend to take separate courses in econom- 
ics and the fact that economics courses are primarily offered in the 
"better" schools) may account for some of the difference. Even allow- 
ing for such special factors, however, it seems clear that a high school 
course in economics significantly increases students' ability to answer 
questions like those included on the "test of Economic Understand- 
ing.'" 

The next five lines show the test scores of the large NORC sample 
of high school social studies teachers. About 30 per cent of these teach- 
ers teach significant amounts of "economics" in separate courses or in 
courses in "problems of democracy"; the others teach primarily histo- 
ry, civics, and other social studies courses. 

The mean score for all teachers was 34.1, with a range from 6 to 50 
on a 50-item test basis. 7 But this composite hides interesting 
differences. For example, high school social studies teachers who have 
never taken a separate course in economics (about one-sixth of the 
total) achieved a mean score of slightly over 32 out of 50. This was 
significantly above the score of high school seniors with no economics 

•For the results of a promising experiment with "programmed" teaching for high school 
students, see R. Attiyeh and K. Lumsden [1]. 

1 Although actually a shorter 25-item form of the test was used, this form was item- 
analyzed and was found to give virtually identical results with the longer version when 
the 25-ittm scores were converted to the 50-item basis. 
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but less than three questions better than high school seniors after they 
had taken a single course in economics. 

Teachers with one or two previous one-semester college courses in 
economics (about 40 per cent of all social studies teachers) scored 
slightly higher, but, strikingly, the difference was not statistically 
significant. More than two college economics courses raised the test 
scores in the following groups. But five or more college courses were 
required to raise the teachers' mean score by five points, the amount 
one recent high school course raised the high school students' scores 
from a lower base. Those teachers who regularly watched the national 
television course, "The American Economy," three times or mora a 
week averaged 41.2, considerably higher than teachers who had had 
five or more previous economics courses. But most of the regular 
watchers had also had three or more previous college courses in eco- 
nomics. One must remember also that the national television course 
was fresher in their minds (completed about nine months before the 
test was given), and other variables may have significantly affected the 
test scores. Section VI reports the results of a multiple regression anal- 
ysis to isolate the impact of the TV course on test scores, holding other 
presumably relevant factors constant. 

The bottom sections of Table 1 show the scores of Carnegie Tech 
and University of Nebraska sophomores who had just completed a 
regular course in economics or the national TV course, as reported by 
McConnell-Felton [6] and Saanders [8]. The mean score of ap- 
proximately 41 was virtually identical for students in both schools, 
and for those who took a regular lecture-discussion course or the na- 
tional TV course. Last, data are shown for samples of industrial super- 
visors, staff workers, and managers. The sample, drawn from 14 large 
companies, is not necessarily representative of all industrial firms, but 
it appears to be a roughly representative cross section of larger, rea- 
sonably well-known companies. The 42.7 mean score for top managers 
rests on a number of responses far too small to be reliable, but it is in- 
cluded for its casual interest The other scores rest on respectable sam- 
ples. Again, other variables would need to be analyzed to ascertain the 
precise significance of the scores, but they do present a crude basis for 
comparison with the scores for other groups shown. 

Beyond these summary measures, detailed analyses of student and 
teacher performance on individual questions are available. By and 
large, students without economics training in high school missed ques- 
tions indiscriminately, with no clear pattern as between subject matter 
areas or factual versus concept versus problem and application ques- 
tions. Questions on monetary policy and operations, international eco- 
nomics, and comparisons between the Soviet and the U.S. economic 
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system were heavily missed, although teachers report (see below) that 
they consider comparative systems the most important area in econom- 
ics to teach. Students generally got the central notions of consumer 
sovereignty and the role of competition in our system, and did surpris- 
ingly well on simple supply, demand, and price questions even without 
formal economics training. 

There was no clear pattern in the improvements induced by study- 
ing a course in economics in high school. Scores improved on all types 
of questions, but especially on factual questions involving general 
magnitudes, on the comparison between the U.S. and the Soviet econo- 
mies, and somewhat more on micro questions than on macro. The 
questions still missed after a course in economics tended to be the 
"harder" ones by analytical standards. But this was not a clear pattern. 
Even after a course in economics, nearly the same proportion still 
missed a simple question asking who is hurt most by inflation (farmers, 
debtors, government bond holders, or businessmen), which was missed 
by 76 per cent of the students. The balance of payments and monetary 
policy continued to he enigmas to a large share of the students. 8 

High school teachers, like their students, did better on micro than 
on macro questions, and did better on factual than on analytical ques- 
tions. Be^ nd these general observations, no clear pattern emerged 
from the item analysis. As with the students, teachers having formal 
courses in economics showed general improvement, especially on "con- 
cept" and analytical questions. But the pattern of improvement was 
not a clear one. 

III. Current Courses and Teaching Practices 
A. Courses and Textbooks 

Of the 12 million students in high school this year in the United 
States, somewhere between 10 and 20 per cent will take a separate 
course in economics before graduation. No completely reliable data 
exist on the figure, but three independent estimates based on large 
samples all fall within this range. 9 Put the other way round, 80 to 90 

•Tables showing a complete item analysis of the 100 questions on the two test forms 
are available on request 

•A VS. Office of Education total count for 1961-62 shows about 290,000 students 
registered in separate courses of economics during the year, of a total of about 2 million 
seniors. About 220,000 were in required courses, the balance in electives. But this ques- 
tionnaire information leaves a substantial question es to just what is reported as a course 
in "economics." For example, "consumer economics" is probably included by many 
schools. 

The Joint Council on Economic Education obtained substantially complete information 
on courses in economics in the 130 largest school systems in the United States and esti- 
mated on this sample (with adjtei^ents for smaller systems based on uncertain evi- 
dence) that nearly 20 per cent of all seniors are taking a separate course in economics. 
Our estimate, making use of the NORC and other data, is more like IS per cent. 
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per cent will graduate without having any formal instruction in eco- 
nomics per se. 

In the 130 largest school systems in the country, approximately nine 
out of ten offer a separate course in economics of at least one semester, 
and over a quarter (including New York City) require a one-semester 
course in economics of all graduates* Nationwide, however, only about 
40 per cent of all public high schools offer a separate course in eco- 
nomics, indicating that such courses are rare in the smaller schools 
outside major metropolitan areas* But there seems to be a clear trend 
in the direction of more required work in economics, in separate eco- 
nomics courses, and in other courses like problems of democracy and 
civics. Pennsylvania, for example, has recently mandated a require- 
ment of 36 class hours of economics for every high school graduate in 
the state. About four-fifths of the 130 largest school systems reported 
curriculum revisions within the last three years increasing the amount 
of economics taught in high school, and the overwhelming majority of 
the teachers in the NORC study also reported plans to increase the 
emphasis on economics. Detailed information is presented below on the 
content of such courses and their success in raising the level of student 
understanding. 

Beyond separate courses called "economics," apparently 15-20 per 
cent of all high school seniors take a one-semester or one-year course 
in "problems of democracy," in which there is usually at least one 
large separate unit on "economics" or some economic problem such as 
social security or natural resources. Few of these go beyond descrip- 
tion of institutions and information on government legislation to deal 
with the problems faced* Nearly all states have a mandated course in 
"civics" or American government; many of these courses have units in 
"economics," mainly of a descriptive-institutional nature. On his way 
through high school, virtually every student must take a course in 
American history, and many schools claim that some economics is 
taught in American history courses. However, there is little evidence to 
back up (his claim, unless one counts the fact that the student hears 
something about the growth of economic institutions such as business 
firms and the westward movement of the frontier. Most of the history 
books mention bimetallism, the Granger Acts, tariff legislation, and the 
like, as well as the development of agriculture and industry as part of 
the historical sweep which the student traverses, but there is little that 
the economist would recognize as economics. Lastly, many students in 
"business education" programs take a required course in "economics." 
Generally, this is heavily weighted with elementary personal finance, 
bookkeeping, office practice, and the like, although some broader 
courses are appearing. 
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Extensive field studies, textbook analyses, and reports of teachers 
generally confirm the picture of high school economics teaching. In a 
few schools, particularly in big cities and upper-class suburban areas, 
some very good high school economics is apparently being taught. But 
even in courses called "economics," the coverage is generally descrip- 
tive and nonanalytical. Much space is given to economic institutions, 
descriptions of natural resources, laws, governmental regulations, and 
the like. In the last few years, some texts have introduced some atten- 
tion to problems of aggregate economics. Sometimes elementary eco- 
nomic concepts are introduced (for example, supply, demand, and 
price), but these are seldom used in application to the problems con- 
sidered in the following chapters. At least to us, most of the texts seem 
taxonomical and probably dull for typical high school youngsters. As 
the AEA's Textbook Study Committee wrote: "Perhaps the most 
alarming characteristic of textbooks in all three courses [economics, 
social studies, and U.S. history] is the dominance of description over 
analysis in the treatment of those economic topics selected for discus- 
sion" [11, p. x]. 

B. Areas of Economics Taught in High Schools 

High school teachers in the NORC survey described above were 
asked to select from a list of 11 subject matter areas in economics 
those which they considered to be "very important" to teach and those 
to which they actually devo.e at least six classroom sessions each year. 
Table 2 shows the results separately for those teaching courses in eco- 
nomics or problems of democracy (about 30 per cent of the total sam- 
ple) and for all others teaching some economic topics, usually in civics, 
history, or general social studies courses. In reading Table 2 it should 
be remembered that, since most high school students do not take a 
course in economics, the attitudes and practices of teachers outside 
conomics and problems of democracy courses may be more important 
on these issues than those of instructors teaching separate courses in 
economics. 

Comparative economic systems (capitalism, communism, etc.) is 
considered far and away the most important area of economics to 
teach by both groups. About 90 per cent of all teachers considered it 
"very important," and about half report spending six or more periods 
on the subject. 10 On the other hand, "the role of markets, prices, and 

'•These high figures for comparative systems may reflect partly a tendency to put into 
this category ("capitalism, socialism, and communism") general treatments of the U.S. 
economy ("capitalism") which don't fit into any of the other categories, even when little 
attention is paid to other systems. However, a substantial portion of the states now 
require a course, or some minimum number of days, on comparative political, economic, 
and social systems, often with specific mention of communism and the USSR. 



ERIC 



12651 25<j 



THE AMERICAN ECONOMIC REVIEW 



Tabix 2—Aixas ot Economics Euphasizod by High School Teachers* 



Rated "Very 
Important" by 
Teachers in: 



Plan to Devote at 
Least Six Classes 
to: 



Area 



(per cent) 



Econ. k 
P.O.D** 



All 
Others 



Econ. & 
P.O.D.** 



All 
Others 



1. Comparative systems (capitalism, com- 



91 
75 
75 
70 
70 
67 
67 
58 
55 
46 
41 



87 
68 
67 
(A 
55 
69 
65 
58 
49 
45 
42 



53 
37 
44 
41 
38 
37 
21 
37 
36 
33 
29 



47 
26 
30 
31 
20 
24 
20 
24 
20 
32 
33 



munism, etc) 

2. Government finance, taxes, etc 

3. Labor unions, distribution of incomes 

4. Development of economic institutions 

5. Role of markets, prices, and profits 
6* Booms, depressions, inflation, etc 

7. Government regulation of business 

8. Consumer economics, personal finance 

9. Money and banking 

10. International economic problems 

11. Underdeveloped economies 



* Data from WORC. Table includes only teachers who reported that they teach something 
about economics, about 80 per cent of the total sample. 
** Problems of Democracy. 

profits" and "booms, depressions, and inflation" ranked well down the 
list among most social studies teachers, but, encouragingly, higher 
among those teaching economics per se. Interestingly, the distribution 
of income (labor unions, wages, social security, etc.) rated high among 
both groups, in contrast to what appears to be a tendency in the 
profession at the university level to play down this area in elementary 
courses. University economists will be interested in the place given the 
topic, "consumer economics, personal finance, etc." Although it ranked 
well down the list for both groups, it was still considered "very impor- 
tant" by more teachers than were money, international economics, and 
the underdeveloped economies. 

C. Teaching Approaches and Attitudes 

All teachers who cover any economics in their courses were pre- 
sented three concrete topics (labor unions, the farm problem, and 
booms ami depressions) and alternative ways of approaching each. 
They were asked to indicate which of three approaches they predomi- 
nately use in teaching about each. One alternative stressed the histori- 
cal approach, another the descriptive-institutional, and the third the 
development of economic concepts and their use in analysis of the situ- 
ation. After this question was answered, teachers were asked 
specifically which of the three approaches they generally use the most. 
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Table 3— Teaching Approaches* 


Approach 


Economics and P.O.D. 
Teachers 


AllOther 
Teachers 


Descriptive-institutional 

Concept-analytical 

Historical 


6.3 
5.8 
4.8 


6.3 
5.4 
5.2 



* Based on NORC study. Each index is based on responses of that group on the teaching 
approach to each of the three problems presented, where 10 would represent use of that ap- 
proach by all teachers on all three issues. 



Not surprisingly, most teachers reported some use of all three ap- 
proaches. Table 3 shows the relative stress placed on each of the three 
by teachers of economics and P.O.D., and by all others who teach any- 
thing about economics in their courses. The index number reported for 
each approach represents a weighted average of the number of first, 
second, and third choices each approach received relative to the others, 
where 10 would represent use of that approach by all teachers on all 
three topics. 

As might be expected, the descriptive-institutional approach leads. 
More surprising is how closely the concept-analytical approach fol- 
lowed. But a warning is in order. All questionnaires were completely 
anonymous, and the questions on the treatment of labor unions, the 
farm problem, and economic growth and fluctuations carefully avoided 
such colored terms as "analytical" and "descriptive," so there should 
be little bias. But use of the "concept-analytical" approach in most 
cases implies only development of the simplest of economic concepts 
and their use in only the most elementary way. Other evidence, e.g., 
the AEA's textbook study quoted above, suggests little use of what 
economists would call "analytical" approa les to economic issues, no 
matter what the teachers replied on this part of the NORC study. 

IV. The Teachers 

Most of the economics taught in the high schools is offered in 
courses in "economics" or "problems of democracy." Table 4 presents 
information on the teachers of these courses ? A compares them with 
all other social studies teachers. 

Tho table indicates that nearly all economi *s and P.O.D. teachers 
have had at least one college course in economics and that 58 per cent 
have had three or more. 11 A quarter of all econor.iics and P.O.D. 
teachers have had five or more courses in economics, and 4 per cent 

u It will be remembered from Table 1 that there was no significant difference on 
economics test scores for teachers with zero or one-two college courses in economics. 
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Table 4— High School Teackkhs oy Economics and Problems of Democracy* 



Economics and All Other Social 
P.O.D, Teachers Studies Teachers 



(per cent) 



1. Number of lege economics courses: 



10 
32 
32 
26 



18 
43 
21 
18 



None 

1-2 

3-4 

5 or more 



2. Major in ,b 
Economics 

Other social science (inc. history) 
Other 



6 
S4 
38 



3 

79 
37 



3, Rank in college class: 
Top 10 per cent 

Top quarter, not top 10 per cent 
Second quarter 
Bottom half 



26 
33 
29 
12 



18 
37 
33 
12 



4. When took last economics course: 
Last year 
2-5 years ago 
5-10 years ago 
More than 10 years ago 



11 

32 
22 
35 



30 
30 
3S 



5. Score on "Tes; of Economic Understanding*' 0 



32.9 



• Data from NORC survey. 

h Totals more than 100 pc" cent because some teachers reported more than one major, 
including both undergraduate and graduate levels. 

• Mean score on SRA test. For comparative data on other groups, see Table 1. 

were majors in economics. Other social studies teachers have had con- 
siderably less economics. 



High school social studies teachers as a group came from the top 
half of their college classes, only 12 per cent from the I^ottom half 
(though about two-thirds of the teachers obtained their grades through 
majors in "education" ). Economics and P.O.D. teachers stood some- 
what higher in their college classes than did other 'social studies teach- 
ers— 26 per cent were in the top 10 per cent of their college classes. 
And over two-fifths of them were reasonably up to date,tf recency of 
completing the last course in economics is a measure. Conversely, as 
indicated by item 4, nearly 60 per cent have not had a course in eco- 
nomics in the last five years, and a third not in the last ten years. The 
last line of the table shows that economics and P.O.D. teachers as a 
group did significantly better on the "Test of Economic Understand- 
ing" than did all other social studies teachers, as could be expected. 
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However, this comparison includes many other variables which 
influence test scores; a more thorough analysis, using a multiple 
regression technique, is provided in Section VI. 

The NORC study also provides a large amount of additional infor- 
mation on high school social studies teachers. For example, 80 per cent 
of all such teachers were men. About one-third were under 30 years of 
age, another third between 30 and 44, and the rest older; the median 
age was 33.5. Interestingly, the proportion of these reporting no course 
in college economics, or only one to two such courses, was appreciably 
higher in the youngest age group than in the middle age groups. This 
suggests that the portion of potential social studies teachers taking 
courses in economics is lower now than it has been in the past, al- 
though it also reflects the fact that some teachers take economics 
courses toward advanced degrees after they begin teaching. About 40 
per cent of all social studies teachers have earned some degree beyond 
the Bachelor's, and about 40 per cent are currently working toward 
some academic degree. Eleven per cent of all social studies teachers 
reported their college major was physical education. 

Apparently about one-tenth of all social studies teachers are hired 
new each year. About a third have been teaching less than five years, 
while 43 per cent have been teaching at least 10 years. About 70 per 
cent report that all of their teaching is in the area of the social studies, 
with history the dominant area. About two-thirds taught at least one 
course in history, while about 13 per cent reported teaching a separate 
course in economics or economic institutions. 

The median annual income of all social studies teachers in 1962-63 
was $6,150. About 25 per cent reported incomes under $5,000 and 21 
per cent reported $7,500 or over. Their family backgrounds, as mea- 
sured by father's occupation, con. J led closely to the composition of 
the general population, except that more social studies teachers came 
from professional and fewer from h n families than in the general pop- 
ulation. About 20 per cent of all socfal studies teachers reported read- 
ing the New York Times regularly; 5 per cent added the Wall Street 
Journal. Two per cent admitted to reading the America?! Econunic 
Review regularly, while about 10 15 per cent reported regular reading 
of Social Education, Social Studies, or similar publications. 

V. Who Watched "The American Economy?" 
We turn now to an evalu^ion of the success of "The American 
Economy," the nationwide television course sponsored by the AEA in 
1962-63. It was the most widely watched educational television course 
in history. Its total audience, averaging over one million persons daily, 
was apparently about twice as large as the highest previous audience 
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for a comparable national TV course, which was the course on proba- 
bility and statistics broadcast in the preceding year. The some 5,200 
viewers enrolled for credit at participating colleges was also the largest 
on record. Over 45,000 TV study guides for "The American Economy" 
were sold, and the National Educational Television Center reports that 
the sound films made from the TV tapes are the most widely used of 
any educational TV series ever produced— 3,346 rentals and 531 sales 
of films in the course had been made as of November 30, 1964. 

About 20 per cent of the 65,000 high school social studies teachers 
in the country watched the program at one time or another in 1961-63 
or 1963-64. The NORC national sample survey, however, indicates 
that only about 5 per cent watched the program at least once a week 
throughout the 1962-63 year. A separate survey conducted by the Na- 
tional Association of Secondary School Principals of five large states 
(California, Connecticut, Illinois, Minnesota, and New York) indicated 
that approximately 15 per cent of the social studies and business 
education teachers in those states were watching the series "on a reg- 
X*BTaTttes4s." Thus, it is clear that a substantial proportion of all social 
/ studies antijbjusiness education teachers watched at least some of "The 
American Economy," but it seems probable that not more than 5-10 
per cent of them (perhaps 3,000-6,000) were serious, regular viewers. 

Some 245 colleges and universities offering a credit course based on 
"The American Economy" reported 5,200 students signed up for credit 
as of March, 1963. A subsequent postcard survey by the authors of 
this article indicates that approximately 85 per cent of these people 
(some 4,400) successfully completed the course for college credit. Of 
those completing the course, slightly over 40 per cent (some 1,800) 
were reported as school teachers. The remainder were regular under- 
graduate students taking "The American Economy" as their introduc- 
tory economics course or other persons taking the course for c :edit. 

Other surveys by the National Association of Secondary School 
Principals (nationwide) and the Committee for Economic Develop- 
ment (New Jersey), together with the NORC data, indicate that cer- 
tainly over 1,000, and perhaps as many as 1,500, high school social 
studies and business education teachers successfully completed "The 
American Economy" for college credit. 

A. Teachers Who Watched 

What do we know about the high schoui social studies teachers who 
watched "The American Economy" regularly? Table 5 summarizes tie 
answer, and compares these watchers with high school social studies 
teachers who were not regular viewers. 

Perhaps the most striking finding is that two-thirds of all the high 
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school social studies teachers who watched "The American Economy" 
regularly at least once a week had previously had three or more 
courses in economics. Sixty-two per cent reported at least one graduate- 
level course in economics, and 44 per cent reported two or more such 
graduate courses (sometimes Schools of Education give graduate edu- 
cation credit for elementary work in economics when supplemented by 
advanced work in teaching methods). Only 14 per cent reported no 
course in economics. On the other hand, only 4 per cent reported that 
economics was the major field of their last academic degree. Item 3 con- 
firms the related fact that the watchers were mainly people who were 
actively concerned with economics; more than half of all regular viewers 
were currently teaching a course in economics or problems of democ- 
racy. 

This finding that half or more of the regular viewers had already 
had a substantial amount of economics accords with previous experi- 

Table 5— High School Social Studies Teachers— Viewers and Nonviewers* 



Regular 
Viewers* 



All Other H.S.S.S. 
Teachers 



1. Previous economics training: 

No economics courses 

3 or more economics courses 

Economics major for last degree 

2. Advanced degree (beyond A.B.) 

5 Teach a course in economics or F.O.D. 

4 Standing in college graduating class 
Upper 10 per cent 
Upper 25 per cent 

5. Degree of professional activity* 

6. Median age 

7. Median years teaching 

8. Sen (per cent male) 



(per cent in each category) 



14 

67 
4 

61 

53 

27 
65 

14 4 

43 8 

14 3 

73 



16 
42 
2 5 

37 

13 

20 

56 

10.1 

33.4 
8 3 
81 



• Based on NORC sample. Per cent in each case sho s percentage of all teachers specified by 
the column heading, except for items 5-8, which are actual numbers. 

b Those who reported watching the program regularly once or more weckl> throughout the 
1962-63 year. 

• Weighted index of four measures of professional activity, including (a) the number of 
professional organizations to which the teacher belonged; (b) the number of tunes he has held 
office in a professional organization, (c) the number of professional and academic meetings 
attended over the past year; and (d) the number of professional and technical periodicals 
which the teacher reads regularly. Performance on each measure was coded 0 6, and possible 
scores on the index run from 0 to 24. 




1281] 



'65 



THE AMERICAN ECONOMIC REVIEW 



ence with educational television and related mass media. Previous 
studies have found that the role of such educational media is more one 
of reinforcing and supporting existing attitudes and interests than in 
developing new ones. People select from their environment stimuli that 
are meaningful to them in terms of previous experiences. Furthermore, 
recent studies in the field of adult education and of audiences for edu- 
cational television in other fields show that those who participate in 
such programs and watch educational TV are more likely to be those 
who start with higher educational levels [5, pp. 80 and 136], [7, p. 
57]. 

Lines 4 and 5 of Table 5 suggest that regular watchers ranked some- 
what higher than other high school social studies teachers in academic 
standing, and that, as might be expected, they were generally more ac- 
tive in professional activities. Regular viewers were older and mo:*e ex- 
perienced than were other teachers, and women comprised a substan- 
tially higher proportion than of all social studies teachers. 

VI. Effectiveness of "Tne American Economy" 

A. Economic Understanding 

How effective was "The American Economy" in adding to the eco- 
nomic understanding of its viewers? The NORC study asked all of the 
social studies teachers who viewed the program at all whether it 
"added a great deal," "added somewhat," "added a little," or "didn't 
add anything." About 40 per cent of regular viewers reported that the 
program added a great deal to their understanding, and another 45 per 
cent that it added somewhat. Conversely, only 15 per cent considered 
that the program added little or nothing. Somewhat surprisingly, regu- 
lar watchers who had already had three or more courses in economics 
felt that the program added just as much as did those who had had 
" or .10 courses at all in economics. 

oh general reactions, however, are suspect as evidence of the 
actual learning that occurred from watching. As part of the NORC 
study, therefore, each social studies teacher in the sample was given a 
shortened, 25-question version of the "Test of Economic Understand- 
ing" described above. Performance on this test may serve as a rough 
measure of the economic understanding of each teacher. Therefore a 
multiple regression analysis was run to isolate the relative importance 
of watching the TV course and of some eight other variables in ex- 
plaining performance on the test. 12 The major variables used in the 

u As was explained above, scores on the 25-itera test were converted to a statistically 
identical 50-item basis by multiplying by 2, to iraintain comparability with the other 
scores shown in Table 1. 
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analysis were: watching the TV course; taking it for credit; previous 
training in college economics; teacher's standing in college graduating 
class; whether or not respondent teaches a separate course in eco- 
nomics or P.O.D.; teacher's professional motivation; and the per- 
sonal characteristics of age and sex. The regression equation was of 
the usual linear form, Y = a + bXi + bX 2 . . . bX u + u, where u 
is a random disturbance term assumed to have the usual simplifying 
properties. 

Table 6 presents the results of the regression analysis. To provide 
extra information several of the main independent variables were sub- 
divided into additive subvariables. For example, regular watchers of 
"The American Economy" were divided into "one or more times a 
week" and "th.ee or more times a week," where all of the second 
group is included in the first. Thus, the coefficient in column 1 of Table 
6 for "watched one or more times a week" is to be interpreted in the 
usual fashion as the effect of this variable, holding all others constant. 
The coefficient for "watched 3 or more times a week" also shows the 
effect of this variable, holding all others constant; thus it shows the 
marginal effect of watching 3 or more times over 1-2 times a week. To 
obtain the full impact of watching 3 or more times a week, we must 
add the two coefficients (.64 + 7.24), which gives us 7.88. Since this 
marginal analysis is applied for all of the first four major variables, 
column 2 has been added to show the full (summed) effect of the final 
subvariable in each group. 

Table 6 indicates that watching "The American Economy" regularly 
three or more times weekly was far and away the most important vari- 
able in raising teachers' performance on the test of economic under- 
standing. Its coefficient of 7.88 was more than twice as large as that of 
about 3.6 for having taken five or more college courses in economics or 
for graduating in the top 10 per cent of one's college class (the best 
proxy we had for intelligence, though one which also includes other 
factors such as motivation). It was much more powerful than taking 
one or more, or even three or more, college courses in economics. No 
other variable approached these in positive explanatory power. 

The R 2 for the multiple regression is .152. For economists used to 
working with time series, this will seem extremely low. However, it is 
roughly in line with the R 2 's obtained in many other cross-section 
studies, for example of consumption behavior, and the F-test shows it 
is significant beyond the .001 level. The low R 2 may occur because 
major variables have been completely omitted — although it is hard to 
see what they might be. Move likely, it is because of the large amount 
of random noise in such a large sample, and because the proxies used 
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Tabix 6— Multiple Regression Analysis op Relative Influence op Selected 
Variables on Teachers' Test Scores* 



Explanatory Variable 


Marginal Subgroup 
Coefficients* 




Watched "The American Economy" 
Xi— one or more times a week 
Xt** three or more times a week 


.64 
7.24** 


7.88 


Took "The American Economy" for credit 0 
^"•in-service credit or college credit 
Xi» college credit only 


-8.90** 
3.10 


-5.80 


College economics training 
X|«*one or more courses 
X%= three or more courses 
X7»five or more courses 


.26 
.88* 
2.44** 


3.58 


College class standing 
Xi=top 25 per cent 
Xi=top 10 per cent 


.12 
3.48** 


3.60 


Teach high school economics or P.O.D. 
Xio= teach such a course 




2.36** 


Professional motivation 
Xn*= activity in professional organization 
Xu«has or is working for advanced degree 




-.12** 
1.16** 


Personal characteristics 
Xi,»sex (male) 
^14=* age 




2.22** 
-.04** 



■ Based on NORC data for 3,966 teachers; some responses could not be used because of 
incomplete data. See text for description of test of economic understanding used. 

b Using one-tailed /-test, *« significant at .01; ***= significant at .001. For difference be- 
tween two columns, see text. All variables are dichotomous (O-l) except for age and for pro- 
fessional motivation (a continuous variable described in Table 5.) 

• "In-service" credit is usually offered directly by high school system toward salary in- 
creases; amount of work varies widely. "College" credit is formal credit, usually involving 
some on-campus review in teaching sessions plus formal examinations. 

for important explanatory variables (for example, for intelligence and 
motivation) are imperfect; we comment below especially on what we 
suspect are inadequate measures of motivation. Similarly, the equation 
weights any college economics course as the equivalent of any other. 
But the proxies all appear to be reasonably sensible, and at least to 
give a tentative answer as to the relative importance of watching "The 
American Economy" as against other obvious possibilities for explain- 
ing performance on the test used. 
The large negative coefficient for taking the TV course "for credit" 
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is one major surprise of the data. One might suppose that taking the 
course for credit would reflect high motivation, in addition to providing 
some additional education through supplementary on-campus or in- 
service classes. But taking the TV course for "in-service credit" 
subtracted nearly 9 points on the test score, holding all the other vari- 
ables constant. Those taking the course for college credit did 3.10 
points better than "in-service" credit-takers, but this difference was 
not statistically significant and, as indicated in column 2 of Table 6, it 
still left them with a coefficient of -5.80. 

This finding strongly suggests that the teachers taking "The Ameri- 
can Economy" for credit differed markedly from other teachers on 
some other variables not included in our regression; otherwise one 
must conclude that the supplementary college teaching and examina- 
tions usually required for students taking the course for college credit 
actually confused the teachers and detracted from what they would 
have learned from just watching the TV course. 

To investigate this unexpected result further, we examined directly 
all the cases of teachers taking the course for credit, only 42 individ- 
uals in the sample, of whom 23 were college-credit viewers and the 
others in-service credit viewers. This examination throws grave doubt 
on the superior motivation hypothesis for these watchers; about 25 per 
cent of tie credit-takers reported watching less than three times a 
week. Indeed, it seems probable that many credit-takers may have 
wanted the credit more than the knowledge and did as little as possible 
to get by. A strong bimodal distribution of test scores among credit- 
takers supports this hypothesis. Several of the low scores came from 
the irregular watchers just noted, and the weighting system implied by 
the sampling procedure happened to give substantial weight to a few of 
these low test scores in producing the results shown. Clearly the 
"professional motivation" variables (Xu and Xu) are weak surrogates 
for the real motivational differences that may have existed. And one 
might reasonably hypothesize that a strong unmeasured motivation 
factor for regular viewers as a group is a major explanatory variable 
not picked up by the present regression equation. 

There is additional evidence on the impact of the TV course, as 
measured by the same test. McConnell-Felton [6] and Saunders 
[8], using independent data in Pennsylvania and Nebraska, recently 
found that regular college students taking a regular year-long on-cam- 
pus sophomore college course in economics scored about the same on 
the "Test of Economic Understanding" as did students taking "The 
American Economy" for college credit at the same institutions. Mc- 
Connell-Felton reported, however, that on a more sophisticated test, 



0 i8&'3 
ERLC 



THE AMERICAN ECONOMIC REVIEW 



requiring more advanced technical tools from economics, regular col- 
lege students taking a typical sophomore course outperformed the TV 
students by a significant margin. In further work, not yet published, 
Saunders reports a clear marginal improvement in the tesr perfor- 
mance of teachers who have taken the TV course and have supple- 
mented this by on-campus classroom work with regular university in- 
structors at some institutions. These results suggest the importance of 
high-quality instruction if classroom experience is to have any value in 
adding to a good TV presentation. 13 

B. Residue from Previous College Economics Courses 

The coefficients for X& and X* in the regression equation indicate 
that one or two previous courses in college economics made no 
significant contribution to performance on the test of economic under- 
standing, while three or four courses added very little to test perfor- 
mance. Five or more college courses added 3.58* to the test score, but 
even this was only about half the contribution made by watching "The 
American Economy" regularly during 1962-63. 

This result may be interpreted by some as a devastating commentary 
on the effectiveness of our elementary economics courses or indeed on 
college economics as a whole. At least, it deserves further considera- 
tion. 

First, on the average, teachers have been out of college about eight 
years, and the absence of any residue from college economics courses 
may merely reflect the well-known phenomenon of forgetting. Indeed, 
psychologists have shown that retention of most learned material is 
very short unless the material is deemed relevant by the learner and is 
used or otherwise reinforced periodically." Perhaps students retain 
virtually nothing from their other college courses either. But this is 
slight consolation if we measure the value of our courses by what lasts 
after the student escapes the final exam. The fact that we may do as 
well as other disciplines is hardly a happy defense. Nor does the fact 
that economics is not "used" or "reinforced" provide much of a ration- 
alization, since everyone in this sample is an active teacher in the so- 
cial studies, of which economics is surely one important component. 
Indeed, over 20 per cent are currently teaching a course in economics 
or problems of democracy. As is indicated by the coefficient for Xxo, 
this current involvement with economics helps appreciably to raise 

"Preliminary findings from Saunders 1 further studies also show that this superior 
understanding of well-trained high school teachers is directly reflected in superior test 
performance of their high school students, as compared to control groups of students 
with other teachers. 

"See, e.g., Carl Hovland on "Learning," in Handbook of Experimental Psychology 
[Ml. 
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those teachers' test scores, but not enough to change the general pic- 
ture significantly. It would have been remarkable had a college eco- 
nomics course taken (on the average) several years ago showed as 
much effect as the recent comparable TV course, and some allowance 
for this fact is required in assessing the relative success of the TV 
experiment; the McConnell-Felton and Saunders studies cited above 
provide direct evidence on this point. But this does not alter the basic 
finding of no significant residue from even a year of college economics. 

Second, the absence of a residue from basic college courses may 
reflect the fact that the test used is a bad measure of economic under- 
standing or that it measures the wrong things. Readers are invited to 
review the sample questions above and form their own judgments. Cer- 
tainly the test is extremely elementary, since it was constructed to test 
basic, though nontechnical, aspects of economic understanding. It cer- 
tainly will not discriminate effectively among people who know a good 
deal of economics —though a look back at Table 1 shows that mean 
scores for all the relevant groups are well below 100 per cent accuracy. 
Moreover, since it was designed to avoid rewarding mere acquaintance 
with technical textbook terms, this test may not show as strong an ad- 
vantage for formal education types of information and understanding 
as would some other tests. 

To obtain a further evaluation of the reasonableness of the "Test of 
Economic Understanding" for this purpose, the department chairmen 
of 30 leading universities and about 30 other leading economists espe- 
cially interested in basic economics were asked to take the test and to 
write a brief impressionistic evaluation of it for the purpose for which 
it was designed. About half did so. Every reply stated that the test 
seemed at least "satisfactory" for the purpose indicated; most stated 
that it seemed "good," "highly appropriate," or "very good," although 
a number expressed reservations about individual questions. But of 
course the test may be a "good" one for its primary purpose and still 
not be satisfactory for evaluating desired lasting effects of our college 
courses. 

Third, perhaps the minor carry-forward from college courses may 
reflect the fact that these teachers took unusually "poor" or "weak" 
undergraduate courses, and that students who took better courses 
would have performed significantly better on the test given. Since re- 
spondents' forms indicate where they studied as undergraduates, we 
h?; e underway a supplementary analysis of this possibility. Prelimi- 
nary data indicate that of about 4,200 teachers for whom information 
is available, about 1,300 (somewhat less than one-third) attended 
"teachers colleges." About 900 attended a group of 120 top "prestige" 
or very well-known universities and liberal arts colleges. About 2,000 
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(just under half the total) attended other colleges and universities. We 
also have preliminary information on how many majored in "educa- 
tion"; such majors accounted for about two-thirds of the total 
sample. 

Economists who mistrust the educational standards of departments 
and schools of education may suspect that these data go far to explain 
the results reported above. Very preliminary analysis suggests that, by 
and large, noneducation majors did somewhat better on the test than 
their educatioa counterparts in schools of comparable stature, and that 
high school teachers from the better-known schools did substantially 
better than those from other schools. But this, of course, may merely 
mirror differences in basic student abilities, and careful study holding 
such other variables constant will be required to judge whether in fact 
different types and sizes of programs and institutions appear to achieve 
sign&cantly different lasting effects from their economics courses. We 
hope to report to the profession separately on this analysis in the near 
future. But while this further analysis may show significant differences 
in the success stories for different types of programs and institutions, 
the preliminary data suggest little reason to suppose it will change the 
basic picture presented here of generally low carry-forward from basic 
courses in college economics. 

C. Plans for Change and Teaching Approaches 

Teachers in the NORC sample also reported on plans to change the 
teaching time they will devote to different areas of economics. Nearly 
twice as many regular watchers of "The American Economy" (defined 
as those who watched three or more times a week) reported plans to 
increase the time spent on half or more of these areas next year as did 
nonwatchers. For example, 33 per cent of all regular watchers plan to 
increase the amount of time spent on six or more of the 11 areas of 
economics listed in Table 2, as compared to only IS per cent of the 
nonwatchers. It is not, of course, permissible to attribute this 
difference solely to watching "The American Economy." Watchers 
may have been the ones who were inclined to put more time on eco- 
nomics in any case. However, the results are consistent with the hope 
that "The American Economy" would stimulate more attention to eco- 
nomics in high school social studies teaching. 

Subject to the same reservation about causation, it is interesting 
that regular watchers show especially large increases in time spent on 
the core materials of macro- and microeconomics, compared to other 
teachers. For example, 39 per cent of all regular viewers are spending 
significantly more time on "economic stability and growth" than be- 
fore the course, compared to only 22 per cent of other teachers. Simi- 
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larly, 37 per cent of the regular watchers now spend significantly more 
time on "the role of markets, prices, and profits," compared to 23 per 
cent of other teachers. Roughly comparable results were reported on 
"the development of modern economic institutions." In other areas of 
economics, the differences between the plans of watchers and non- 
watchers were much less marked, except that 33 per cent of the regular 
viewers reported plans to spend less time on consumer economics, com- 
pared to much smaller changes for other teachers. These results are 
consistent with the hope that "The American Economy" would develop 
more understanding of the central analytical core of economics and the 
way it can be used in thinking about economic problems. 

Similarly, regular viewers reported a much greater emphasis on "an- 
alytic" (as contrasted to "descriptive" and "historical") teaching ap- 
proaches than did occasional or nonviewers. On the three situations 
given (see III C), 24 per cent of regular viewers chose the analytical 
approach in teaching on all three, as compared to only 9 per cent of 
the other teachers. And 43 per cent of regular viewers chose an analyti- 
cal approach to two of the three situations, as compared to only 27 per 
cent of the occasional or nonviewers. Again, the results are consistent 
with the hopes of "The American Economy" to stimulate a more ana- 
lytical approach to economic issues, although they certainly cannot be 
attributed solely to that course. 

This evaluation of the effectiveness of "The American Economy" 
does not include a large amount of ad hoc evidence reported by teach- 
ers, school administrators, and others interested in economic education 
from around the United States. These reports, almost without excep- 
tion, agree that "The American Economy" was a widely watched, 
popular, effective TV program and course in economics. An abbrevi- 
ated 60-film condensation of the course has been widely used in "in-ser- 
vice" teacher training programs and is now being used as the founda- 
tion for teacher development in SO major school systems, under a new 
program by the Joint Council on Economic Education to help improve 
economic teaching in the schools. Experiments are also under way for 
the direct use of the films in high school courses. College teachers are 
using selected films widely, and industrial firms are using parts or all 
of the course for in-company development of middle- and lower -man- 
agement people. 

VII. Some Implications 

In conclusion, we suggest the following as some implications of the 
findings. 

1. If we want most of our future citizens to have any formal train- 
ing in economics, it must be given in the high schools, barring an enor- 
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mous change in the national educational pattern beyond even the large 
increases in college enrollment currently expected. Thus, unless the 
profession wishes to wash its hands of responsibility for economic un- 
derstanding of the citizenry, it must take a strong, active interest in 
the teaching of economics in the high schools. 

2. It is possible to teach a substantial amount of economic under- 
standing to average students in the high schools. Even with present in- 
adequate high school courses in economics, students taking such 
courses showed large improvements in average test scores on the sim- 
ple "Test of Economic Understanding." Still unpublished experiments 
in particular localities confirm this possibility, and indicate that with 
well-trained high school economics teachers or with effective "pro- 
grammed learning" the improvement can be much more dramatic than 
shown in Table 1. 

3. Better-trained high school teachers are critical in improving eco- 
nomic understanding provided by the schools. The small test margin of 
the mass of social studies teachers over average high school students 
who have had merely a weak one-semester course in economics 5s dra- 
matic evidence on this point. Superintendents and other school admin- 
istrators repeatedly stress the importance of improving the basic eco- 
nomic understanding of their social studies teachers if real improvement 
is to be made in their teaching. As indicated above, recent experiments 
confirm this strongly. Intensive work with competent, interested, and 
understanding university economists, followed up by in-service help, 
can dramatically improve the understanding of average high school 
teachers, their ability to te*ch effectively, and the performance of their 
students. However, merely takfog more courses in economics or going 
through weakly taught summer institutes or in-service programs ap- 
parently does little good for high .ehool teachers; quality of instruction 
and teaching materials appear to be crucial. 

4. Unless the results reported above are grossly misleading, it is 
clear that present (or previous) college courses in economics don't do 
an effective job of preparing school teachers to teach economics, even 
recognizing the reservations indicated above. Whatever our students 
do on the final exam, the several-years-after test shows little residue, 
even for high school teachers for whom economic issues provide a part 
of their day-to-day teaching responsibilities. These findings emphasize 
again the well-known psychological principle that "learning" unsup- 
ported by motiv* ion and reinforcement through repeated use or other 
means has a very short half-life. If our college courses don't develop 
student interest in economics for the years to come and if the analysis 
we teach isn't usable and used by students on their own after college, 
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there is little reason to expect much to last, however elegant tlie analy- 
sis or important the descriptive material in the course. 

5. Since the average age of high b^hool social studies teachers is 
only 33, and since about one-third of all teachers have been teaching 
less than five years and nearly two-thirds less than 10 years, improve- 
ment in the economic training provided in the collegto and universities 
could have a rap J impact on teaching in the high schools. 

6. Improved textbooks and other teaching materials are critically 
needed as a foundation for improved teaching of economics in the 
schools. This includes not only materials for special courses in econom- 
ics, but at least equally better materials for courses in problems of de- 
mocracy, civics, American history, and the like. It is essential to re- 
member that the great bulk of students get their exposure to economic 
issues in such courses. The economic preparation of the teachers in 
such courses is particularly weak, and such teachers badly need the 
best teaching materials. 

7. If we want to get more economic analysis and points of view into 
history, problems of democracy, and civics courses in the schools, grow- 
ing experience suggests that such teaching materials must be fitted into 
the patterns of those courses. For example, simply preparing booklets 
on economic analysis or description of economic institutions to be 
included in courses in American history or civics is unlikely to have 
much influence. iversely, carefully developed materials which fit into 
the pattern of the American history course and develop important eco- 
nomic concepts and ways of using those concepts within the flow of the 
history course have been found valuable by history tearhers. 

8. Over all, there is little likelihood that economic understanding in 
the high schools will improve greatly unless school administrators and 
teachers get more sympathetic and active aid from professional econo- 
mists than they have had to date. 
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Phillip Saunders 

The Lasting Effects of Introductory 
Economics Courses 

How much economics do students learn in "typical" two-semester sopho- 
more introductory economics courses? How much of this learning lasts five years 
after these students graduate from college (some seven years after a sophomore 
economics course would have been taken)? How do students "feel" about the 
interest and difficulty of these courses? Do these feelings change as time goes 
on? Increasing numbers of calls for "accountability" in all the disciplines of 
higher education aside, these are important questions for economics — a profes- 
sion vith a particular concern for problems of efficiency and resource alloca- 
tion. 

On most college campuses more students are enrolled in introductory 
economics courses than in all other undergraduate economics courses combined, 
and academic members of the profession collectively spend more time teaching 
introductory economics than an> other single course. Yet, the one most widely 
publicized pronouncement on the subject continues to be the now-famous 
"Stigler hypothesis" which suggests that, if an essay test on current economic 
problems were administered to college seniors (or persons five years out of 
college), there would be no difference in the performance of students with a 
"conventional" one-year course and those who had never had a course in 
economics (Stigler 1963). 1 

If substantiated, the results predicted by Stigler would constitute a serious 
indictment of the pedagogical effectiveness of the profession. However, the kind 
of essay examination he proposed is not practical with the type of large 
nationwide sample that would be necessary to adequately test his hypothesis in 
its original form. Early empirical studies by Bach and Saunders (1965, 1966), 
using a multiple-choice examination, lent support to Stigler's hypothesis. 
However, these studies were limited to a nationwide sample of high school social 
studies teachers, and no data were available on the teachers* scholastic ability or 
the grades they received in their college economics courses. Also, the test used in 
the Bach-Saundfcis studies was the very elementary Test of Economic Under- 
standing (TEU 1964), an instrument designed primarily for use with high school 
students. 

This paper reports the results of a study designed to provide a more 
adequate, albeit still indirect, test of the Stigler hypothesis and to obtain 
additional information on college students* attitudes about economics and on 
their reading habits. The regression analyses reported below indicate that 
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Stiglcr's original prediction may be unduly pessimistic. Introductory economics 
courses did have a lasting impact on students* performance as measured by a 
college-level multiple-choice test constructed especially for this study, but the 
lasting effects of introductory courses on test performance do appear to diminish 
over the years. 

The study design and the survey instruments used are described in Section 
I. Test performance results are presented in Section II. Section III discusses 
some inconclusive results using additional variables for different types of 
introductory economics courses, and Section IV reports students' ratings of the 
introductory economics courses they took. Section V summarizes the major 
conclusions and discusses some implications of the study for the teaching of 
introductory economics. 

I. Study Design and Test Instrument 

This study compares the performance of students who have taken a 
"typical" two-semester college course in introductory economics with the perfor- 
mance of similar students who have not taken such a course. The comparisons 
were made at three different times: ( I ) immediately after an introductory course 
in economics, when most students are sophomores; (2) two years after an 
introductory course in economics, when most students are seniors, (3) five years 
after students have graduated from college — some seven years after a sopho- 
more economics course would have been taken. 

The data were collected during the 1969-70 and 1970-71 academic years 
from students and alumni at twenty-five carefully selected colleges and universi- 
ties throughout the United States by means of a specially designed set of 
questionnaires. Part I of each questionnaire consisted of a series of questions 
concerning the respondents' sex, class standing or occupation, major course of 
study, whether they had taken introductory economics, their interest in econom- 
ics as a subject, hew important they thought economics was. whether they 
thought a coune in economics should be required for college graduation, and a 
checklist oi items designed to reveal their current reading habits. If the 
respondents indicated they had taken or were taking an introductory economics 
course, they were also asked to indicate whether it was required or an elective, 
and they were asked to i<»te the course, comparing it to other college courses they 
had taken, with respect to difficulty of subject matter, interest of subject matter, 
quality of textbook, quality of instruction, and time actually spent on the 
course. 

The information obtained from Part I of the questionnaires was supple- 
mented by data from school records on the respondents' SAT scores, grades 
actually received in all undergraduate econor/iics courses taken, method of 
instruction (large sections, small sections, use of graduate student instructors, 
etc.), and textbooks used in the various introductory courses involved. Informa- 
tion was also obtained on the "intellectualism" (explained below) of the students 
at the schools included in this study. 

Part II of each questionnaire was an especially devised version of the Test 
of Understanding in College Economics (hybrid TUCE). The test consisted of 
33 four-option, multiple-choice questions selected from the four forms of the 
original Test of Understanding in College Economics (TUCE). The selected 
questions were designed to cover specified topics in micro- and macroeconomics 
using three different types of questions designated as recognition and 
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understanding (RU). simple application (SA), and complex application (CA) 2 
Lleven questions of each type were included. A deliberate attempt was made to 
omit questions that seemed to rel> on specific technical dctaiis, since it did not 
seem reasonable to expect alumni who had taken an introductory course several 
years previously to perform well on purely technical questions. The hybrid 
TUCE sought to measure economic una e> standing, not memorization. The RU 
questions cannot necessarily be answcecl by mte memory, and the SA and CA 
questions require respondents to use economics to arrive at the correct answer 
Most of the CA questions arc prefaced by actual or hypothetical newspaper 
quotations such as one is likely to be called upon to interpret and understand in 
real life outside the classroom. 1 

There were 1.220 sop!i~..iOre respondents in this study. 955 senior respon- 
dents, and 1.257 alumni respondents. The sophomore respondents included 535 
students (44 percent) with no college economics, and 685 (56 percent) with only 
a two-semester introductory course. The senior sample included 421 students 
(44 percent) with no college economics, 261 (27 percent) with only a two- 
semester introductory course, and 273 (29 percent) with up to nine one-semester 
college economics courses beyond the introductory level (the average for this 
group was 3.83 courses). The alumni sample included 435 respondents (35 
percent) with no college economics. 464 (37 percent) with only a two-semester 
introductory course, and 358 (28 percent) with up to nine one-semester college 
economics courses beyond the introductory level (the average for this group was 
3 50 courses) 4 

II. Test Performance of All Respondents 

The results of the regressions run with the hy brid-TUCEi score as the 
dependent variable are shown in tables I and 2 Since cross-sectional data were 
used, separate regressions were run for the sophomore, senior, and alumni 
samples. Two regressions were run with the alumni sample a "short" one using 
exactly the same variables as for the sophomore and senior regressions, and a 
"long" one that included variables on marital status, occupation, and income, for 
whi ijta were available from o nl - the alumni respondents. Table I treats the 
respondents' experience in college econoirics courses only in terms of the 
number of cianomics courses taken Table 2 aiso includes data for the grades 
received in each course. All course grades were converted to a scale on when 
A 4. B 3. C 2, D I. and I - 0. and grades for the two semesters were 
averaged to get a single grade for the entire introductory course 

ECONOMICS EXPERIENC E 

Introductory courses. As indicated in Tabic I. holding other things 
constant, taking an introductory college economics course is significantly asso- 
ciated with a difference in total test score of 6 18 points in the sophomore 
sample, 4.76 points in the senior sample. ;».id either 3.23 or 3 24 points in the 
alumni sample, depending upon which alumni regression is used Thus, the 
nujor finding of this study is that taking an introductory economics course docs 
have both an immediate and a lasting effect on hybrid-TUCE: scores s Further- 
more. Table 2 indicates that, holding other thing constant, each letter grade in 
introductory economics is significantly associated with a difference in total test 
score of 2 00 points in the sophomore simple, I 69 in the senior sample, and 
either I I I or I 09 in the alumni sample. 
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TABLE 1 

Multiple Regression Results with Economics Experience by Courses 
(dependent variable: total Hybrid TUCE score with range of 33 to 4) 
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TABLE 2 

Multiple Regression Results with Economics Experience by Grades in Courses 
(dependent variable: total Hybrid TUCE score with range of 33 to 4) 
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-.59 


-1.46 


- 94 


-2.19* 


- 76 


-2.54* 


-.99 


Economics 


.46 


.63 


47 


65 


.29 


.49** 


-.05 


Nonecon. bus. ad 


.49 


1.08 


1 02 


1 94 


26 


62* 


-.36 


Education 


-1.19 


-2.62** 


- 13 


- 26 


-136 


-2.54** 


-1.41 


"Other" 


-.77 


-1.44 


-1.52 


-2 57" 


11 


23 


-.15 



Occupation: 0 

Econ. 2.01 3.29*^ 

Econ. and bus. related .79 1 .87 

Prim, or sec. teacher - .60 - 1 .64* 

College or jr. coll teaching, & grad. student .46 1.52 

Housewife -.22 -.56 
Income:* 

$25,000 and over .40 .39 

$20,000-24,999 -.19 -.24 

$15,000-19,999 .45 .96 

$10,000-14,999 -.05 -.17 

Under $4,000 -.06 -.17 

No answer 1.69 2.28* 



•Significant at .05 level. 

••Significant at ,01 level. 

•Where not stated, code is 1-0, and 1 - yes. 

b Major in social science other than economics suppressed in intercept 

eM Other" occupations suppressed in intercept. 

income $5,000-$9,999 suppressed in intercept. 
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The results in both tables indicate that the differences in test performance 
associated with introductory economics courses in the alumni Sumple are slightly 
over half as large as they are in the sophomore sample. Since cross-sectional 
rather than time series data are used, this implies, but does not demonstrate, 
that there is an overall "retention rate" of some 52 percent-55 percent (or an 
overall "decay rate" of 45 percent-48 percent) from the end of a sophomore 
course to a time some seven years later. This is equivalent to an annual rate of 
depreciation of economic knowledge of about 6 percent. 

More disaggregated data not shown here indicate that between the alumni 
sample and the sophomore sample the implied "retention rate" is larger (and the 
implied "decay rate" is smaller) on the simple and complex application questions 
than on the recognition and understanding questions used on the hybrid 



Courses beyond the introductory level. The data in tables I and 2 
indicate that each economics course taken beyond the introductory level is 
associated with a difference in total test performance of 0.53 points in the senior 
sample and 0.50 or 0.48 in the alumni sample. A difference of one letter grade in 
each such course is associated with a difference in total test performance of 0.18 
points in the senior sample and 0.16 or 0.15 in the alumni sample. As with the 
introductory course, this indicates a lasting impact for upper-level economics 
courses and course grades. 

OTHER CONTROL VARIABLES. Data were collected on several other vari- 
ables in addition to upper-level college economics courses, so that the influence, 
if any, of these variables could also be held constant in comparisons between 
respondents with and without introductory economics. Several of these variables 
reported in tables I and 2 were consistently and significantly associated with test 
performance in all three of our samples. Other control variables did not prove to 
be consistently or significantly associated with test performance in all three 
samples. Before discussing the various control variables, it should je noted that 
the R 2 s of the regressions shown in tables 1 and 2 are high compared to other 
cross-sectional studies done in economic education; but the variables in the 
regressions still explain only 54-62 percent of the variation in total test scores 
observed in the various samples. 

dividual SAT scores. In all fhe regressions shown in tables 1 and 2, 
ca<* lOO-point difference in a respondent's SAT score is positively and 
significantly associated with a difference in total test pcrfotmance. The size of 
the coefficient ; s roughly one point, and ranges from a high of 1.17 points for the 
sophomore sample in Table 1 to a low of 0.75 for the senior sample in Table 

i: 

"Intellectualism" of the student body. In addition to the respondent's 
SAT score, which was taken to be the best available proxy for an individual's 
innate intellectual "ability," we also obtained data on the intellectual "ability" 
of the entire student body of the sample schools to see whether, independent of a 
particular individual's ability, association with "bright" students or attendance 
at a "high-quality" school that attracts "bright" students and "good" faculty is a 
factor in economic understanding. This consideration was suggested in an earlier 
study by Attiyeh ct al. (1971, p. 71) who found the average freshman entrance 
examination score in each school was highly significant and nearly half as 
important to an individual's performance on an economics test as his or her own 
SAT entrance examination score. 
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We used three measures of the character of each school's student body and 
conducted a sensitivity analysis to deter/hine which of the three was the most 
significant for control purposes. One measure was the average SAT scores of 
entering freshmen. The other two were "selectivity" and "intellectualism" scales 
constructed by Astin (I965). 8 The intellectualism scale proved to be the most 
useful for our purposes, but the overall results were not affected in any 
substantial way when either of the other two measures was substituted. 

Tables 1 and 2 show that a ten-point difference in the intellectualism of a 
school's students is positively and significantly associated with a difference in 
total test performance in all the regressions shown. The size of the coefficients 
ranges from 0.41 to 0.27. Even for alumni, attendance at a school with an 
intellectualism rating of say 70 rather than 30 is associated with a difference in 
test performance of between 1.64 and 1.08 points, depending upon whHi other 
variables are introduced into the basic regression model. The nature of a school's 
student body thus appears to be a signficant factor that should not be ignored in 
future studies attempting to explain economics test performance. 

Gender. As has been true in many other studies (Siegfried 1979), males 
did significantly better than females on the economics test in all of the 
regressions shown in tables I and 2. The size of the coefficient for males ranged 
from 1.19 in the "short" alumni regression in Table 2 to 0.71 in the sophomore 
regression in Table 1. 

General attitudes toward economics. All questionnaires contained the 
following three questions, with the responses coded as indicated by the numbers 
in parentheses, although the numbers themselves did not show on the question- 
naire forms: 

How would you rate your present interest in economics as a subject? 
(Check one.) Very High (5), High (4), Average (3), Low (2), Very Low 
(1). 

How important do you think a general understanding of economics is in 
today s world? (Check one.) Very Important (5), Important (4), Fa 'y 
Important (3), Fairly Unimportant (2), Very Unimportant (1). 

Do vou feel that all students should be required to take a course in 
economics before they graduate from college? (Check one.) Strongly Agree 
(5), Agree (4), Undecided (3), Disagree (2), Strongly Disagree (I). 

Each one-point difference on the scale for the first question was positively and 
significantly associated with a difference in total test performance in all of the 
regressions shown. The size of the coefficient ranged from 1.12 in the senior 
regression in Table 1 to 0.72 in the "long" alumni regression in Table 2. The 
responses to the second question were not significantly associated with test 
performance in any of the regressions shown. For the last question, a one-point 
difference on the scale was negatively associated with test performance in all the 
regressions shown. The coefficients, all of which were fairly small, were 
statistically significant in the senior and alumni regressions, but not in the 
sophomore regressions. 

The positive association between inures* in economics as a subject and test 
performance is not surprising, but it is mere difficult to explain the weaker 
negative association between a beiief that ecwnomics should be required and 
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performance on the hybrid TUCE. Perb .ps an awareness of one's lack of 
economic understanding is associated with the belief that college students should 
be required to take economics courses. 

Reading habits. All respondents were asked whether they read the 
economic, business, or financial sections of a daily newspaper; a week ; news 
magazine such as Newsweek, Time, or U.S. News and World Report; and the 
other items shown in tables 1 and 2. For each, a response of "read frequently" 
was coded 2; "read occasionally," 1 ; and "read never or hardly ever," 0. The only 
consistent, statistically significant association with test performance was for 
reading the economics section of a weekly news magazine. 

Major course of study. With two exceptions, after adjusting for all the 
other variables in the regressions, a respondent's major course of study does not 
appear to have a strong influence on his or her test performance. Humanities and 
fine arts majors did have consistently negative coefficients, however, and five of 
these coefficients (including all four in the alumni regressions) were statistically 
signficant. Five of the negative coefficients for education majors (again includ- 
ing all four in the alumni regression) were also significant statistically, and the 
size of these coefficients in the alumni regressions (- 1.17 to - 1.41) may help 
explain why the earlier Bach-Saunders studies, confined to social studies 
teachers, most of whom were presumably education majors, failed to find the 
significant lasting effect for introductory economics courses that appears in the 
present study. 

Marital status, occupation, and income. Expanding the "short" alumni 
regression (third column in tables 1 and 2) by introducing the variables on 
marital status, occupation, and income increased the amount of variation 
explained only slightly (fourth column): R 2 increased 0.012 or 1.2 percentage 
points in Table 1 and 0.010 or one percentage point in Table 2. Readers can 
explore for themselves the influence of the individual items in each of these 
categories in tables 1 and 2. Of all of the professions examined separately, for 
example, only the economics profession did significantly better on the total 
hybrid TUCE than did the "other" professions suppressed in the intercept in 
both tables. 

HI. Test Performance of Respondents Taking Different Types 
of Economics Courses 

Regressions similar 4 o those reported above were run using information on 
class size, use of graduate student instructors, textbook used, and respondents' 
ratings of course difficulty, course interest, quality of text, quality of instruction, 
and time spent on introductory economics. None of these additional variables 
had a consistent, statistically significant association with test performance, 
however, and a major disappointment of this study is that we were not able to 
discern any particular types economics courses, instruction, or textbooks that 
had a consistent, significantly different impact from any other type. There is 
apparently no "one best way" to lead all students to economic understanding. 

IV. Student Ratings of Introductory Economics Courses 

Another set of regressions was run using as dependent variables each of the 
five different scales on which the respondents with introductory economics were 
asked to rate and compare these courses with other college courses they had 
taken. While there were again jio consistent differences associated with these 

• O THE JOURNAL OF ECONOMIC EDUCATION 



2SG 



TABLE 3 

Simple Comparison of Mean Student Ratings of Introductory 
Economics Courses in Different Samples 3 
(Question: Compared to other college courses you took, how would you rate 
your college introductory economics course on each of the following items 9 

(5 = "much more" or "one of the very best," 4 - above average, 3 - 
average, 2 « below average, 1 =* "much less" or "one of the very worst"|) 



Seniors Alumni 







Intro. Beyond 




Intro. 


Beyond 




Rating Items 


Soph. Only 


Intro. 


Total Only 


mtro. 


Total 


Difficulty of subject matter 


3 27 


3 20 


3 29 


3 24 


3 04 


2 96 


3 00 




(83) 


(88) 


(80) 


(84) 


(94) 


(80) 


(88) 


Interest of subject matter 


3 28 


2 69 


3 54 


3 12 


2 72 


3 36 


3 00 


(97) 


(97) 


(94) 


(104) 


(101) 


(101) 


(106) 


Quality of textbook 


3 36 


3 20 


3 47 


3 34 


3 27 


361 


3 42 




(83) 


(93) 


(99) 


(97) 


(100) 


(87) 


(96) 


Quality of instruction 


371 


3 18 


3 67 


3 44 


2 98 


3 32 


3 13 


(95) 


(1 19) 


(1 11) 


(1 17) 


(1 10) 


(1C2) 


(108) 


Time you actually spent on 


3 08 


2 85 


3 29 


3 08 


2 70 


2 96 


2 81 


the course 


(91) 


(94) 


(87) 


(93) 


(97) 


(81) 


(91) 


No of respondents 


685 


534 


273 


807 


464 


358 


822 



'Standard deviations are shown m parentheses below the means 



measures and the different "type-of-course" variables mentioned above, the data 
collected nevertheless provide some interesting information on students' opinion 
of the introductory economics courses offered during the 1 960s (see Table 3). 

As indicated in Table 3, the respondents in all three samples who had taken 
a course in introductory economics were asked to compare this course with other 
college courses in terms of difficulty, interest, quality of text, quality of 
instruction, and time actually spent on the course. The five possible responses 
and their codes are listed in the table. 

Most of the mean ratings in Table 3 are greater than the "average" value of 
3.00, and the seniors and alumni who took additional courses beyond the 
introductory level consistently rated the course higher than those who took only 
the introductory course. Enough data are provided in the table to permit the 
calculation of tests of statistical significance for any particular comparisons in 
which the reader might be interested, but a few words of caution should be 
emphasized. Like the test scores discussed above, the student ratings should be 
interpreted on a cross-sectionai rather than a time-series basis. Different people 
were rating the same courses at different times. Also, the framework of "other 
college courses" being used for comparison is larger for the senior and alumni 
respondents since it contains junior- and senior-year courses not yet taken by the 
respondents in the sophomore sample. The alumni also had to think back over a 
longer period of time than the seniors and « ' miores. Finally, although our 
study indicated no major changes, the courses rated may themselves have 
changed somewhat over the years. 
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V. Concluding Comments and Implications 

Teachers of introductory economics can take some encouragement from the 
results of this study. While there is little r ~*om for complacency, their efforts 
have not been completely in vain, and introductory economics coursts do appear 
to have some impact that lasts beyond the final exam. A difference of over three 
points on the hybrid TUCE some seven years after a sophomore course has been 
completed is not only statistically significant, it also seems to be educationally 
important. This is over 50 percent of the difference associated with taking an 
introductory course immediately after the course is completed. This finding 
casts some doubt on the Stigler hypothesis quoted above. Although it is true that 
we did not use the type of essay questions he originally had in mind, Stigler 
himself was a member of the orignal TUCE committee, and thus had a hand in 
formulating the test questions used in this study. It is also true that the hybrid 
TUCE used in this study puts a much heavier emphasis on realistic application 
questions than any alternative testing instrument available. 

While our findings cast doubt on ti e prediction of no difference in student 
performance, they do lend some support to a basic point Stigler wa« making in 
criticizing "the watered-down encyclopedia which constitutes the present ccurse 
in beginning college economics" which, he argued, "does not teach the student 
how to think on economic questions" (p. 657). The implications in our results 
that suggest the retention rate is higher on SA and CA questions than on RU 
questions supports Stigler's statement that "an introductory-terminal course in 
economics makes its greatest contribution to the education of students if it 
concentrates upon a few subjects which are developed in sufficient detail and 
applied to a sufficient variety of actual economic problems to cause the student 
to absorb the basic logic of the approach" (p. 658). Our data also imply that 
introductory economics courses might well devote more explicit attention to 
increasing students' interest in economics as a subject and encouraging them to 
make a habit of reading the economics or business sections of weekly news 
magazine:*, sine; thes" factors appear to exert an independent influence on 
economic understanding. With economic events and reporting increasingly 
mo "ng from the business page to the front page, we may have an opportunity to 
sii hcantly extend the lasting effectiveness of our introductory economics 
c ses. 



FOOTNOTES 

I In Stiglcr's own words, the complete hypothesis was stated as follows: "I propose the 
following tesc. Select an adequate sample of seniors (I would prefer men fi/e years 
out of college), equally divided between those who have never had a course in 
economics and those who have had a conventional one-year course Give them an 
examination on current economic problems, not on textbook questions. I predict they 
will not differ in their performance. 1 shall illustrate below the kind of question that 

should be asked in this test" (p. 657) 

"Give the student a summary page or two of the arguments and evidence 
presented in the discussion (in Congress and the public press) of HR 5983 (regarding 
turkeys] and let him explain benefits and costs of the scheme— with the grading 
based, of course, *>n the coherence of his argument and relevance of his evidence, not 
on the conclusions reached" (p. 659). 

2. The original TUCE was developed by a distinguished committee consisting of 
Rcndigs Fels. Vanderbilt University, chairman; G. L. Bach. Stanford University; 
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William G. Bowen, Princeton University; R. A. Gordon, University of California 
(Berkeley); Bernard F. Haley, University of California (Santa Cruz); Paul A. 
Samuelson, Massachusetts Institute of Technology; George Stigler, University of 
Chicago; John M. Stalnaker, National Merit Scholarship Corporation, consultant; 
and Paul L. Dresscl, Michigan State University, executive director. The four versions 
of the original TUCE are described and student performance dr.ta are discussed in 
Fels (1967), Psychological Corporation (1968), and Welsh and Fels (1969). 

3. Sec Saunders and Welsh (1975). Copies of the test, questionnaires, and specific 
details of the sampling procedure used in this study are available in Saunders (1973) 
and can be obtained by writing to the author. 

4. Students with only a one-semester introductory economics course, or students who 
took only one semester of a two-semester sequence, were omitted from the study, 
which was designed to test the effectiveness of the "typical" two-semester college 
introductory course. 

5. The data in tables 1 and 2 also indicate that taking introductory economics on a 
required rather than an elective basis has a consistent negative association with total 
test performance, but only three of the eight coefficients were statistically significant 
at the .05 level. 

6. A scries of regressions identical to those shown in tables I and 2 was run with the 
scores on each of the sets of eleven RU, SA, and CA questions, rather than the total 
hybnd-TUCE scores, as the dependent variable. Of the total difference of 2.94 points 
associated with an introductory economics course between the sophomore regression 
and the long alumni regression (6.18 - 3.24 - 2.94) there was a difference of 1.25 
points (42.5 percent of the total) on the 1 1 RU questions, a difference of 0.84 points 
(28.6 percent of the total) on the i I SA questions, and a difference of 0.85 points 
(28.9 percent of the total) on the 1 1 CA questions. 

Looked at in another way, on the 1 1 RU questions, the differences associated 
with an introductory course declined from 2.44 points in the sophomore regression to 
1.19 points in the alumni regression (a change of 1.25 points or 51.2 percent of the 
sophomore difference); on the 1 1 SA questions the difference associated with an 
introductory course declined from 2.03 points in the sophomore regression to 1.19 
points in the alumni regression (a change of 0.84 points or 41.4 percent of the 
difference in the sophomore regression); on the 1 1 CA questions, the difference 
associated with an introductory course declined from 1.71 points in the sophomore 
regression to 0.86 points in the alumni regression (a change of 0.85 points or 49.7 
percent of the difference in the sophomore regression). 

Similar results were obtained in regressions using course grades as the introduc- 
tory economics variab! i. 

7. Each SAT score has a maximum value of 1,600, the sum of the scores for verbal 
aptitude and for mathematical aptitude, each with a maximum value of 800. 

8. Astin's "intellectualism" scale is expressed as a range of "T-score«" with a mean of 
50 and a standard deviation of 10, and it is constructed such tnat "an entering 
student body with a high score would be expected to be iugh in academic aptitude 
(especially mathematical aptitude) and to have a high percentage of its students 
pursuing careers in science and planning to go on for Ph.D. degrees" (p. 54). The 
unweighted mean of the schools in the lasting effects study on Astin's intellectualism 
scale was 54.52, with a range of 79 to 27. This indicates that the schools represented 
in our study are slightly above the average of all U.S. colleges and universities in 
terms of the intellectualism of their student bodies; but our sample covers a bread 
range, and there is no reason to believe that it is not adequate or representative for 
our purposes. 

9. A discussion of a partial preliminary version of these findings, along with a discussion 
of the methodological problems of comparing results from the different samples used 
in this study, appears in Saunders (1971) and a more extensive methodological 
discussion is contained in Saunders (1973). 
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IS TEACHING THE BEST WAY TO LEARN? 
AN EVALUATION OF BENEFITS AND COSTS TO 
UNDERGRADUATE STUDENT PROCTORS IN ELEMENTARY 

ECONOMICS* 



l. INTRODUCTION 

The Joint Council on Economic Educa- 
tion (JCEE) has recently sponsored several 
projects to explore alternative approaches to 
teaching college introductory economics. 
The experimental course at Vanderbilt, un- 
der the supervision of Rendigs Fels, 1 com- 
bines the case method of instruction 1 with 
the self-paced personalized system of instruc- 
tion (PSI) developed by Fred S. Keller [5]. 

The course consists of twenty lessons. 
Each lesson includes a short examination on 
basic concepts (seven exams), analytical 
skills (seven exams), or policy cases (six 
exams). The course has no lectures. Students 
are given assignments, study them with the 
help of a proctor, and take the examinations 
when they think they are prepared. The crite- 
rion for passing is mastery— one hundred 
percent. Students who do not meet the crite- 
rion are recycled, and, after additional study 
may retake tests on lessons they fail to mas- 
ter. This process continues until each lesson 
is passed or the semester ends. Grades de- 
pend partly on how many lessons are com- 
pleted during the semester and partly on a 
final examination. The case method is in- 

* This evaluation is part of a larger project evaluating 
the personalized self-paced case-method of instruction 
of elementary economics being conducted experimen- 
tally by Vanderbilt University and the Joint Council for 
Economic Education. Financial support was received 
from the JCEE, the Ford Foundation, and Vanderbilt. 
Lee Wehby, Mike Cook, Torn Overstreet, and Martin 
Rim provided clerical assistance. Rendigs Fels and Ste- 
phen Strand contributed to the project from its in- 
ception to final reporting of the results. 

1 An extensive description of the course, including all 
materials necessary to replicate it, is contained in (3]. 

* The essence of the case method is requiring students 
to systematically think through real-world problems for 
themselves. A discussion of the case method is included 

O »n [2\ Many of the cases used in the course are reported 

:RlC inW '' 

SOURCE; Soisthtm Economic Journal, voh4>, na 3, ]i 



tegrated into the PSI method in the final six 
lessons. In these lessons students analyze rea- 
listic policy issues in a systematic way. 

There has been a substantial amount of 
research on PSI courses summarized in [6]. 
Most of this research has been conducted in 
psychology classes. Elizabeth Allison has re- 
cently reviewed the application of PSI prin- 
ciples to college level economic instruction 
[1]. Scott McCuskey has evaluated an adap- 
tation of the Vanderbilt-JCEE course in his 
Ph.D. thesis [7]. 

The results of evaluations of student learn- 
ing in PSI courses are inconclusive, although 
PSI students seem to do at least as well as 
conventionally instructed students on stand- 
ardized multiple-choice examinations. In 
general students report that they enjoy PSI 
courses more than traditional courses and 
usually think that they have learned more 
than they would ha r e in a conventional 
course (but the objective data do not always 
support this hypothesis). Instructors com- 
monly report that it is exciting to teach a PSI 
course and there is some scattered evidence 
that PSI courses produce a higher percentage 
of students concentrating in the field of the 
course. 

The economics of PSI courses has not 
been examined as carefully as their educa- 
tional value. PSI courses require significant 
start-up costs plus substantial instructional 
resources during their operation. The com- 
mon organization of PSI courses hus one 
instructor supervising a group of proctors, 
who in turn each tutor up to ten students. 
Thus for a course of one hundred students 
there would be at least eleven teachers. This 
is generally beyond the cost capabilities of 
most colleges and universities. For this rea- 
son, and because many faculty believe that 

n/y 1977, pp. 1394-1400. Reprinted with permission of *He 
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one learns best when one is instructing a 
course, most PSI courses have utilized un- 
dergraduate students as proctors. Only in 
this way can the course be made cost-effec- 
tive. 

Allison says that after the course has been 
developed it requires less time from the in- 
structor than a conventional lecture course 
[I, 8]. Fels contends that the supervision of 
the PSI course plus the supervision of the 
proctors means that such courses place a 
heavier burden on the instructor than com- 
parable conventionally taught courses [3, 
10J. He argues that only if the instructor is 
given teaching credit for the proctors as 
well as fcr the students enrc!ied in the PSI 
course do the benefits exceed the costs to 
the instructor. Some schools reward the 
undergraduate students with course credit; 
others compensate the proctors financially. 
Whether the instructor is given teaching 
credit for the proctors is intimately con- 
nected with the form of compensation re- 
ceived by the proctors. 

If teaching credit for supervising the foe- 
tors and compensating proctors with course 
credit tather than cash are necessary to make 
the course cost-effective to both the school 
and the instructor, it is critical that the 
educational content cf the proctoring experi- 
ence be evaluated. While PSI courses com- 
monly use students as proctors, and many 
assertions are made that the student-proctor 
learns more from this activity than available 
alternatives, there have been no systematic 
empirical efforts to validate the;,e claims. 

This study is an attempt to evaluate the 
costs and benefits that accrue to proctors 
in the Vandcrbilt-JCEE PSI case-method 
course. 5 Twenty-one proctors were employed 
during the 1974-1975 academic year. They 
answered students 1 questions, administered 
tests, and gave provisional grades to tests 
and papers. Each proctor was responsible for 
about four to five students. The proctors 
were all undergraduate student^ primarily 



9 A complete evaluation of the Vandcrbilt-JCEE 
course on additional criteria is contained in [8]. 



juniors and seniors with good grade records, 
Most of them had previously taken inter- 
mediate microeconomics and intermediate 
macroeconomics. The proctors took each of 
the examinations themselves from Fels prior 
to administering them to enrolled students. 
They were compensated for their services 
with three semester hours of credit. 

Each proctor was interviewed immediately 
after completing the proctoring experience. 
In addition, examination scores for the 
spring proctors and a control group are ana- 
lyzed. Multiple linear regression analysis is 
used to examine the hypothesis that students 
deve'op a more complete understanding of 
material by instructing others. The evidence 
supports that hypothesis. 

II. INTERVIEW RESULTS 

All twenty-one proctors thought that they 
learned more economics by proctoring than 
they would have learned in an upper level 
economics course. Seventeen of the twenty- 
one thought that academic credit was the 
appropriate form of compensation for proc- 
toring. The strongest argument against 
awarding academic credit for proctoring is 
that the economic principles learned while 
proctoring should have been learned when 
the proctors took the elementary course 
themselves. The counter-argument is that it 
is not necessary to master anything close to 
one hundred percent of the material pre- 
sented to students in conventional elemen- 
tary economics classes in order to earn an 
A-. An A- level of understanding is far 
below that necessary to teach the principles 
to other students. The academic credit for 
proctoring is for learning "new material" 
with which the proctors were previously 
familiar, but which they did not completely 
understand. The academic credit is for 
changing "familiarity" into "understanding" 
of much of the material in the elementary 
course. 

The student-proctors suggested many ben- 
efits from the experience that extend beyond 
their improved understanding of economic 
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principles. They commonly cited an im- 
provement in the clarity of their verbal ex- 
pression, insight into the teaching process 
and how people learn things, experience they 
obtained in motivating people, and the inter- 
esting (and sometimes trying) experience of 
recycling people. 

Fourteen of the twenty-one proctors iden- 
tified time as the predominant cost to them 
of proctoring. Many of the proctors claimed 
that proctoring took about twice as much 
time as studying for an alternative upper 
level economics course. Several proctors 
complained ^bout the distribution of time 
required. They apparently were frustrated by 
their lack of control over their personal time. 
They had to take the examinations them- 
selves at least as rapidly as their most enthu- 
siastic student and be available to students 
when they needed help, which sometimes in- 
terfered with the proctors' study plans for 
their other courses. 

There were very few costs identified by 
proctors other than time. A few mentioned 
inevitable personality conflicts. The respon- 
sibility for other people's progress eliminates 
proctors' options to "just go for a C." Some 
proctors indicated that recycling students 
caused them some emotional strain. Only 
one proctor thought that proctoring hurt his 
grades in other courses that he was taking 
simultaneously. Most of the prociors recog- 
nized a trade-off between time and pressure. 
They perceived less pressure on them as 
proctors than there would have been in an 
upper level economics course. There were 
fewer crises (i.e., exams). One proctor put it 
bluntly: "more time, but less sweat." 

III. CONTROLLED EXPERIMENT 

An empirical evaluation of the impact of 
proctoring on understanding economic prin- 
ciples was conducted in spring 1975. To con- 
trol for initial understanding a test of eco- 
nomic principles was administered to ten 
proctors and twenty matched students prior 
to the semester. AH of these students took 
the same examination at the end of the se- 



mester, after all classes had been completed, 
but prior to final examinations. 

The test instrument was the ,w (1975) 
introductory economics test (micro and 
macro) in the College Level Examination 
Program (CLEP) of the Educational Testing 
Service. This is a 100 hem 90 minute mul- 
tiple-choice test of economic principles. It 
stresses understanding of abstract economic 
theory. It is a difficult examination of supe- 
rior quality and thus was particularly suit- 
able for our purpose, since we wished to 
discriminate among different levels of under- 
standing at relatively high levels of compe- 
tence. However, the CLEP examination does 
not test skill in applications, which consti- 
tutes about one-fourth of the experimental 
course. All students were paid a flat fee of ten 
dollars to take the two exam.*. Students were 
motivated to perform weil on the CLEP ex- 
amination by offering ca;h prizes of up to ten 
dollars to the top fifteen scorers on each 
exam, the size of the prize being related to 
performance. 

The ten proctors were selected by Fels 
prior to the spring semester. The twenty con- 
trol group students were matched two to a 
proctor o.i the criteria of cumulative grade 
point average, year in school, background in 
intermediate microeconomics and inter- 
mediate macroeconomics, major, and num- 
ber of previous economics courses taken. To 
test the similarity of the control and experi- 
mental groups we compared their perform- 
ance on the CLEP examination before the 
semester. The proctors averaged 72.4, while 
the control group averaged 72.6. There is no 
statistically significant difference between 
these means. On the basis of this evidence, 
and considering our matching procedure, we 
concluded that the experimental and control 
groups originated from the same population 
and thus could be used in pooled regression 
analysis. 

The index of performance measuring im- 
proved understanding of economics prin- 
ciples during the semester is the difference 
between ihe CLEP score after the semester 
and the CLEP score before the semester. 
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This is a value added measure; it controls for 
students' initial understanding of economics 
principles. Since the goal is to relate im- 
provement in understanding of economics 
principles during the semester ofproctoring to 
the proctoring activity, a value added mea- 
sure is most appropriate. However, because 
students who perform particularly well on 
the pre-test cannot score very well on a value 
added measure we used an alternative mea- 
sure of improvement in understanding of 
economics principles, value added divided by 
the number of points that were available to 
be added during the semester. 4 Symbolically 
the measures of performance are: 

P l = T 2 - r„ and P s - ,qJ) _ jy 

where T t is the score on the ith test and P } is 
the yth performance index. 

A simple comparison of mean perform 
ance between the two groups reveals that the 
proctors did appreciably better than the con- 
trol group. These results are reported in 
Table I. The proctors' mean is significantly 
higher than the conventional students' mean 
at the 0.05 significance level. The pre-test 
scores are essentially identical. On the aver- 
age the proctors gained 9.4 points during the 
semester while the control group gained 3.5 
points. 

There may have been significant differ- 
ences in the learning experience, environ- 
ment, or background of students that are not 
controlled in a simple comparison of mean 
scores. In particular, .different students take 
different numbers of courses. In addition, it 
may be important to see if there are system- 
atic differences in the other standard factors 
that might explain the difference in perform- 
ance between proctors and control students. 
We expect most of the control variables like 
grade point average, previous intermediate 
theory courses, cumulative number of eco- 
nomics courses, and year in school to show 
little impact cn differential performance be- 
cause we initially matched the proctor and 

* This measure was. to my knowledge, first proposed 
by Whitney [9]. 



control samples on the basis of these criteria. 
Othe r control variables include SAT scores, 
sex, mathematics background, other work- 
load during the semester, and number of 
other economics courses during the semester. 
These variables have all been examined in 
much detail in the literature of economics 
education and most of the hypotheses relat- 
ing them to learning are self-evident. There- 
fore the results of the multiple linear regres- 
sions are reported in Table II without further 
elaboration. 

The regression results indicate that only 
proctoring and the number of economics 
courses that a student took during the exper- 
imental semester had a significant effect on 
P 2 . Not one of the remaining control varia- 
bles was statistically significant at the .05 
level. This provides support for the con- 
tention that proctoring in the PSI elementary 
course provides substantial improvement in 
the understanding of economic principles by 
the proctors. In addition, it indicates that 
alternative economic courses are of value for 
achieving the same goal. It is somewhat 
comforting to discover that the major de- 
vice universities employ to improve student 
understanding demonstrates a statistically 
significant impact in the expected direction 
and that other environmental and socio-de- 
mographic variables do not, by themselves, 
explain much of the difference in improved 
understanding among students. 

Because we expected proctoring and alter- 
native economics courses to improve per- 
formance scores, these variables are sub- 
jected to one-tail statistical significance tests. 
Age, sex, SAT scores, semester hours, and 
total economics courses are subjected to a 
t*vo-tail statistical significance test because 
there are competing hypotheses with respect 
to the predicted sign of the coefficient of each 
of these variables. In additional regression 
models a dummy variable for whether a stu- 
dent had taken calculus, students' grade 
point average, and a series of dummy varia- 
bles for whether students had taken inter- 
mediate micro-economic theory, inter- 
mediate macroeconomic theory, and the 
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TABLE I 

Mean Performance of Proctors and Control Students on CLEP Examination 

Performance Proctors Control t-ratio fo: difference 

Students between means 



72. A 



72.6 



O.A 



+9. A 



+3.5 



2.82** 



+0.333 



+0.155 



2 . 20** 



10 



20 



= statistically significant at .01 level. 



grades they obtained in the intermediate the- 
ory courses proved to be statistically non- 
significant. This result was expected since 
students were matched in the two samples on 
the basis of grade point average and inter- 
mediate economic theory experie: ce. Since 
the presence of the other control variables 
did not affect the behavior of the coefficients 
or standard errors of the variables included 
in regression (2), the details of these other 
regressions are not reported. 

Because the number of current semester 
courses in economics is not significant in 
regression (1), the coefficient should be inter* 
preted as zero. Thus a comparison between 
proctoring and taking advanced economics 
courses will obviously favor proctoring in 
regression (I), since there is no impact of 
advanced economics courses. However, it is 
appropriate to control for those factors that 
were not considered in the matching pro- 
cedure. Therefore, regression (2) should be 
examined. 

The results of regression (2) indicate that 
an additional economics course during the 
semester is associated with a 7.8 percent in- 
crease in percent of potential value added to 
the CLEP score. Proctoring is associated 
with a 20.S percent increase in percent of 
potential value added achieved. According 
to these figures, proctoring is associated with 



2.7 times the increase in CLEP performance 
during the semester than would result from 
taking one advanced economics course. If 
the relative cost to the students of these two 
alternatives is two to one (many proctors 
stated that the time demand of proctoring 
was twice that of an advanced course), then 
in terms of the test criterion proctoring 
seems more favorable than an advanced eco- 
nomics course by a ratio of about three to 
two. 

The analysis indicates that course instruc- 
tion and proctoring were the only significant 
determinants of how much economic theory 
was learned by upper-class students. How- 
ever, several caveats are in order. First, these 
conclusions come from a relatively small 
sample of students in one discipline at one 
university. Whether they can be applied to 
other educational environments is problem- 
atical. On the other hand, these empirical 
data are, to our knowledge, the first system- 
atic attempt to test the thesis that the most 
effective way to learn economics is to instruct 
it. In an area with an abundance of ad hoc 
theorizing and self-appointed experts, bring- 
ing systematic empirical data to bear on the 
question must be considered progress. Sec- 
ond, only linear models were tested. For ex- 
ample, we did not consider whether the im- 
pact of an advanced economics course on 
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TABLE II 

Mui riPLE Rfgression Reswts. Dependent Variabie is (PosTTEST-PRETt:ST)/(100«PRfc-n-ST) 



Variable 


(1) 


(2) 


Measure, [Mean] 


Proctor 


.158* 
(1.95) 


.208* 
(2.40) 


proctor ■ 1, otherwise ■ 0, [.333] 


Concurrent 

Economics 

Courses 


.038 
(0.99) 


.078* 
(1.81) 


number of spring 1975 economics 
courses, [2.87] 


Age 




-.012 
(-1.05) 


years (proxy for year in school), 
[19.67] 


Sex 




.073 
(0.70) 


male - 1, female = 0, [.70] 


SAT verbal 




.0016 
(1.69) 


numerical score, [669] 


SAT quantitative 




.0004 
(0.51) 


numerical score, [579] 


Semester hours 




-.021 
(-1.24) 


semester credit hours in spring 1975, 
[14.1] 


Cumulative Econ- 
omics Courses 




.006 
(0.48^ 


cumulative economics courses as 
of June 1975, [10.5] 


R 2 (coefficient 
of determination) 


.183 


.497 




F (Fisher's 
F-rat io) 


3.0* 


2.5* 





n = 30 

* = statistically significant at 0.05 level. 



improving the understanding of economic 
theory depends on whether it is the first, 
second, third, or fourth economics course 
taken during the semester. Some people 
would argue that there is a threshold effect, 
that is, a minimum number of courses neces- 
sary to have any impact at all. Others would 
argue that there are diminishing marginal 
returns to improving one's understanding of 
economic theory with respect to additional 
courses. A non-linear impact might also be 
expected from several of the other variables 
in our regressions. Alternative functional 
forms have not been examined empirically 
because of the limited size of the sample. 



Third, the empirical test explored the relative 
effect of proctoring ard advanced courses 
only on improving students' undeistand- 
ing of economic theory. There are other 
worthwhile goals of advanced econom- 
ics courses— providing familiarity with in- 
stitutions, generating enthusiasm for eco- 
nomic inquiry, exposing students to research 
methods, etc. None of these objectives are 
measured effectively by the test instrument. 
On the other hand, there are also many bene- 
fits of proctoring that go beyond the im- 
proved understanding of economic theory. 
Proctors presumably improve their skills in 
analyzing realistic policy cases; they learn to 
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deal with other people in conflict situations; 
they are forced into self discipline and de- 
velop patience. These factors may easily be 
as important to students* future life experi- 
ence as learning about economic institutions 
or research methodology. We have evaluated 
only one dimension of the multi-dimensional 
output of university instruction. This com- 
parison can be useful if it permits an objec- 
tive evaluation of alternative instructional 
techniques on this single dimension. Then 
more effort can be directed toward com- 
parative evaluation of the other dimensions 
of instruction and the all important prob- 
lems of how these different dimensions 
should be weighted in instructional deci- 
sions. 

IV. CONCLUSION 

Based on interviews of proctors and on 
empirical analysis, it is concluded that proc- 
toring is an effective means for students to 
improve their understanding of economic 
principles. The twenty-one proctors during 
1974-1975 unanimously voiced the view that 
they learned more economic \*y proctoring 
than by taking an alternative advanced eco- 
nomics course. An empirical study of ten 
proctors and twenty control students in- 
dicates that the proctors would improve their 
scores on an economics exam 2.7 times the 
improvement by students who elected to take 
an advanced economics course instead. On 
the basis of this evidence an argument can be 
made for compensating proctors for their 
services with academic credit. 

Studies which conclude that students in 
PSI courses perform no differently from stu- 
dents in conventional lecture-discussion 
courses have failed to enumerate an impor- 



tant set of benefits that accrue to the student- 
proctors. This supports a case for employing 
students (rather than professional instruc- 
tors) as proctors for PSI courses. If student- 
p p octors are compensated with course credit 
and the instructor is given teaching credit for 
ihe proctors, PSI courses stand a good 
chance of being cost effective. The evidence 
uncovered in this analysis suggests that there 
is more learning of economic theorv taking 
place during proctoring than during Mterna- 
tive economics courses, which provides a 
logical foundation for compensating under- 
graduate student-proctors in PSI courses 
with academic credit. 

JOHN J. SIEGFRIED 

Vanderbilt University 
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Teacher Effectiveness and 
Student Performance 



Howard P. Tuckman 



This paper evaluates the effects of using graduate instructors rather than experienced 
faculty in the macroeconomics principles course. Because a single measure of teacher 
effectiveness may cause an increase in "measured" rather than "desired" output [1], we 
employ several measures of effectiveness. The first (GO is student score on a ten-question 
version of the Test of Understanding in College Economics (TUCE). 1 The second (Qz) is 
student score on a twenty-question economic attitudes (AS) test, constructed along the lines 
suggested by Mann and Fusfeld (MF) [4] . Measures of this type are not free of value judgments 
and do not allow for the theoretical divisions which currently exist within the profession. 
However, economic attitudes are changed in the principles class and an effort should be made to 
measure the change, as MF found that an economic attitudes test captures learning changes in 
subject matter-oriented students white grades are better measures of effectiveness in teaching 
success-oriented students. 

Final grade (fis) for the quarter (A = 4, B = 3, . . . , D = 1) is the third measure of 
effectiveness. A student's grade reflects the instructor's judgment of how well he performs. If 
some instructors differ substantially in their grading procedures or in their ability to teach, this 
has a bearing on their effectiveness and is germane to our study. The fourth measure (Q*) is an 
index of student interest in economics (5 = very high, 4 = high,. . . , 1 = very low) based on a 
set of questions askud to students at the end of the course. The final measure (Gs) is a measure of 
student willingness to take another course in economics (1 = student intends to take another 
course, 0 = otherwise). The lattertwo measures are intended to capture the instructor's ability 
to interest students in the materials. 

Zero- order correlation coefficients for each measure are shown in Table 1 . Note that the 
largest correlation (between AS and TUCE) is only 0.28. Moreover, a consistent pattern 
emerges; the correlations between the AS and other measures are generally higher than between 
either the TUCE and other measures or the final grade and other measures. 

Data Description 

Beginning in the 1972-73 winter quarter and for five successive quarters thereafter, 
students in the macroeconomics principles courses at Florida State were asked to complete two 
exams (TUCE and AS) and a detailed questionnaire at both the beginning and end of the 
quarter. A total of 612 students were tested. 2 Of these 548 or about 90 percent provided usable 
data. 3 A total of 12 classes were included in the sample, taught by one full professor, two 
associates, two assistants, and three graduate instructors. On the basis of prior studies in this 
area the following independent variables were utilized: Xi = grade point average (5 = 3.5-4.0, 
Howard P. Tuckman is Associate Professor at the Institute for Social Research, The Florida State 
University. 
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Zaro-Orxfer Corrttatlon* Among tht Dtp«nd#nt VariaWts 



TUCE 



AS 



Final 
Grade 



Interest in 
Economics 



Continuation 



TUCE 
AS 

Final grade 

Interest 

Continuation 



0.28 



-.07 
.20 



-.12 
-.17 
.07 



.01 
.07 
.04 
.13 



4 = 3.49-3.0, .... 1 = 1.99 or less), Xi = score on TUCE pretest, X3 = score on AS pretest, 
X4 = sex (1 = female, 0 = otherwise), X* = major (1 = economics or natural science, 0 = 
otherwise), Xe = precourse interest in economics (5 = very high, 4 = high, . . . , 1 = very 
low), X? = class standing (5 = graduate student, 4 = senior, . . . , 1 = freshman), Xs = 
precourse willingness to take another nonrequired course in economics (1 = if willing, 0 = 
otherwise), Yi = number of years since instructor received his Ph.D. (A third-year graduate 
student is two years away from receiving his Ph.D. and thus enters the regression with a -2), 
Yi = number of hours taught by the instructor during the quarter, Y3 = graduate instructor( 1 = 
if a graduate student, 0 = otherwise). 

The Role of Experienced Teachers 

Several studies have shown that teacher experience affects student performance. High 
school students taught by experienced teachers receive higher scores on national tests than 
those" taught by less experienced teacher [3]. Likewise, experienced teachers produce fewer 
dropouts and more students who wish to pursue their education [6]. While the evidence for 
college teachers is less extensive, it appears that a teacher's experience affects the performance 
of his or her students. 

Regression analysis is used to examine the relationship between teacher experience and 
student performance, controlling for the effects of student and other nonteacher-related 
characteristics. Each column of Table 2 gives a regression equation for one of the gj outputs; 
each row shows the regression coefficients for one of the independent variables, along with its 
/-value (in parentVeses below the coefficient 4 ). How does teacher experience affect student 
performance? The years beyond Ph.D. coefficient (Yi) is statistically significant only in the 
TUCE and AS equations. At the sample mean of 7.6 years of experience, the average 
post-TUCE score is raised by 0.3 point or 7 percent abo\ e the pre-TUCE mean score. Post- 
AS score rises by 0.65 point or about 6 percent above the pre-AS mean score. These 
computations do not suggest that experienced teachers contribute a great deal to student 
learning. 5 In the post-TUCE equation, the Aff *ct of one year of additional teaching experience 
(0.039) is substantially less than that of increasing the average grade point of students by 0. 1 
point (0.054). 6 This seems to imply that student rather than instructor quality is the key to 
classroom performance. 7 

Consider the other coefficients in Table- 2: 

1. Note that the constant term in each of the regressions is positive and significant. This 
suggests that other variables which have been excluded from the equation may be 
significant. Thus far, the absence of an effective learning theory has prevented researchers 
from identifying other determinants of learning. 

2. The positive coefficient on the grade point average variable suggests that students who do 
well in other courses also do well in economics. When this finding is combmed with the lack 
of significance for the Pre-Intercst, Pre-Continue, and student-major variables it r*iay 




TftMt 2 

Tht Effect of TMPhmr Ymis of Exptrtonct 
at indicated by Sovwa! Ptrformunc* Mttturts 



Independent 


rosi- 


Pa**- 
* OK" 


finii 


interest in 


mnuiww 


varuu>Kt 




AS 




IAIMwUUW 


In Economics 


Constant term 


5.591 


12.760 


3.719 


1.543 


0.360 




(16.6) 


(18.4) 


(14 5) 


(9.0) 


(6.0) 


Grade point (Xi) 


0.542 


0.631 


0.376 




— 


(6.2) 


(5.6) 


(9.7) 






Pre-TUCE (Xi) 


0.219 











— 




(5.5) 










Pre-AS(Xs) 


— 


0.313 


0.050 





— 




- 


(9.5) 


(4.4) 






Sex (X4) 


-0.371 







-0.306 


-0.109 


(2.1) 






2.8) 


(3.1) 


Major (X») 










— 


• 


Pre- Interest (Xt ) 










0.347 


0.058 








(5.7) 


(2.9) 


Class 


— 





-0.1190 


— 




QUU1UJUK V"* I / 






(2.5) 






nc*vviiuuuv \At/ 










0.408 










(8.5) 


Years beyond 


0.039 


0.085 


— 







Ph.D. (Yi) 


(4.0) 


(3.7) 








Number of hours 




-0.462 








taught (Yi) 




(5.1) 








Adjusted/? 1 


0.18 


0.25 


0.22 


0.08 


0.21 


/-Ratio 


27.11 


40.94 


33.27 


22.68 


31.95 


Adjusted 








1.17 


0.38 


standard eiror 


1.87 


2.44 


0.84 



•Significant at 5-percent level if Pre-Continue variable is eliminated but insignificant when it is included. 



suggest that a student's learning skill, rather than his prior interest in economics, is the 
crucial factor in detenruning how well he learns economic principles. 

3. A student's pre-TUCE score presumably reflects his knowledge of economics when he 
enters the course. Surprisingly, a high prcscore does not appear related to a high final grade, 
again suggesting that prior exposure to the field may not be as important as the ability to 
learn. 

4. Performance on*the pre-AS exam is significant in determining post-AS score and final 
grade. Since the AS test presumably measures economic reasoning rather than prior 
knowledge, this may explain why AS is significant and TUCE is not. An increase of about 
3.3 points on the pre- AS exam adds one point to the post-AS score and 0. 17 points to the 
final grade. 

5. Women perform less well on the TUCE than do males. They are also less interested in 
economics both when they enter and when they leave the course. This is reflected in fewer 
female continuations to other economic courses. 

6. We know surprisingly little about what <l ermines a student* s interest in economics , either 
before or after die course. 

7. The hours taught variable is included in the regression to control for the effects of variation 
in workload on teacher effectiveness. It is difficult to interpret in a small sample, however, 
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since hours taught are not evenly distributed across ranks. Ad increase in the number of 
hours taught appears to have an effect c~ nost-AS score but this may be due to its correlation 
with the graduate instructor variable. 
8. Finally, although the/F*'s are low, the/Usts are ai ;ignificant at a 1-percent level and the 
X"s ait in line with those of other researchers in this urea. 

The Effectiveness of Graduate Instructors 

Since experienced teachers raise student score' on the TUCE and AS, one might expect 
that departments using graduate instructors in lieu of faculty experience a decline in student 
learning. Such an expectation ignores the possibility that graduate instmctors compensate for 
their lack of experience by their enthusiasm, efforts to identify what their students don't 
understand, approachability, and greater rapport with their class. To determine the effect of 
graduate instructors in the principles sequence during the summer of 1973 and in subsequent 
quarters, the economics department offered several sections taught by graduate students. These 
students developed their lectures free from departmental directive, although they followed a 
common course outline. 8 The sole criterion for selection as an instructor involved the student's 
past rating as a teaching assistant 

If graduate instructors affect student performance, this should be reflected in a significant 
r-value for the Ys variable when it is included in the regressions discussed above. Estimate I in 
Table 3 shows the regression coefficient obtained by including Ys in the above regressions. 
Only the regression coefficients for Yi and Ys and the adjusted/? 2 and /-statistic are shown 
since most of the other regression coefficients do not change substantially. Because instructors 
teach fewer hours, however, Ys is correlated v'\ Ys. Likewise, Y3 and Yi a*e also correlated . 
Thus, the inclusion of all three variables mask* oie significance of the Ys varia* To provide 
an alternative test of the effect of graduate instructors, the Yi and Y2 variables wv.e removed 
from the model and the regressions renin with only the Ys variable (Estimate 10. 

TaWa3 

TWo Eatlmataa of tha Effect of Graduate Inatmctora 
on Studant Parformanca 





Port- 


Port- 


Final 


Interest in 


Continue 




TUCE 


AS 


Grade 


Economics 


in Economics 


Graduate Instructor 












Estimate I 


0.193 


0.024 


-0.311 


0.214 


0.014 




(0.7) 


(0.1) 


(1.9) 


(1.6) 


(0.3) 


Estimate U 


-0.362 


1.011 


-0.014 


0.214 


0.014 




(1-7) 


(3.6) 


(0.2) 


(1.6) 


(0.3) 


Years beyond Ph.D. 












Estimate I 


0.044 

a"; 


0.085 
(3.5) 


a 


a 


a 


Adjusted R* 












Estimate I 


0.18 


0.25 


0.22 


0.09 


0.21 


Estimate II 


0.16 


0.23 


0.21 


0.09 


0.21 


/'Ratio 












Estimate I 


21.78 


32.68 


27.53 


16.07 


25.53 


Estimate II 


23.09 


48.57 


32.76 


16.06 


25.53 



"Denotes that the variable was not included in Table 3 because it lacked statistical significance. 

The specification of the equation affects our conclusions regarding the effects of graduate 
instructors. With Ys included as the sole instructor variable, the results suggest that graduate 
instructors have a positive effect on student performance on the AS test and a negative effect on 
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performance on the TUCE. They have a nonsignificant effect on the other measures. With the 
three Y variables included together, Ys has no significant effect on TUCE and AS but a 
negative effect on the final grade. 9 Taken together the .esults suggest that experienced faculty 
may do a better job in teaching the skills required for the TUCE exams while graduate 
instructors have a more positive effect on student attitudes. 

Conclusion 

Prior studies raise a paradox highlighted in our findings. Experienced faculty presumably 
have a positive effect on student performance yet graduate instructors appear to be as effective 
as faculty in teaching economic principles [5]. Can this paradox be explained? What may be 
involved here are different sets of skills: in the case of graduate students, availability to grasp 
what students don't understand, enthusiasm, and approachability; for experienced faculty, a 
greater depth of understanding of the technical material, greater self-confidence, r r ore critical 
approach to the subject. Given the approach followed in this study, one can only speculate as to 
the nderlying factors explaining our results The measures of learning used here capture only 
part of the total learning environment. An effective instructor conveys many things not 
captured by our measures: excitement with the subject, caution in accepting unsupported 
arguments, a perspective on the economic system, etc. We have yet to develop measuies which 
provide an effective comparison of faculty and graduate instructors along thes^ lines. Given the 
mixed nature of our findings, however, we would urge economic researcher* to use more than 
one measure of effectiveness in studies of this type. 

Footnotes 

'We were forced to limit the number of questions asked because of time constraints. Previous studies in this 
journal found that TUCE questions have a high degree of internal reliability. The Kuder-Richardson 
formula 20 statistic for the modified exam was 0.6. 

2 The total enrollment in the period was over 1 ,000. Since it was not possible to test all sections taught in the 
department, a representative sampling of the principal faculty was obtained. 

3 Two factors account for the less than 100 percent response rate: an inability to match pre- and 
postquestionnaire and absence when either the pre- or posttest was given. A check of the unmatched data 
suggests that no bias is introduced from the former source. Every effort was made to eliminate absences as 
the study progressed. Whether they introduce a serious source of bias is uncertain. 

4 Only the significant variables are included in the final regression equation. 

3 On the point see [8]. 

'Obviously this option is nci open to most departments. 

'Ideally, years of teaching experience rather than years beyond Ph.D. should have been used. For the 
faculty in this sample, the distinction is not important Likewise, because of the distribution of course 
loads it was not necessary to control for whether an instructor previously taught introductory economics or 
whether his major assignment was advanced undergraduate or graduate level courses. 

8 The exception to this rule was the fall qt iter where one faculty member and two graduate instructors 
taught a coordinated set of courses. See Barbara and Howard Tuckman, "Toward a More Effective 
Economic Principles Class," Journal of Economic Education, Special Issue No. 3 (Spring 1975). 

9 The insignificant coefficient for Ya in Q i of estimate I seems tc be because Yi and Y2 pick up this variation 
when they are included in the regression. The negative coefficient for Ya in the Qz equation appears to be 
due to a difference in the grading procedures followed by two faculty in the sample. See Barbara and 
Howard Tuckman, op. cit. 
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Kim Sosin and Campbell R. McConnell 

Thf Impact of Introductory 
Economics on Student Perceptions 
off Income Distribution 



The purposes of this paper are (1) to survey student attitudes on the issue of 
income inequality; (2) to determine whether one semester of economics causes 
any statistically significant shifts in student attitudes on the income distribution 
issue; and (3) to determine which, if any, of a number of student background 
characteristics might be significant in explaining student perceptions of what the 
personal distribution of income ought to be. 

Design of the Study 

The primary focus of this study was an experimental group consisting of 
students at the University of Nebraska-Lincoln, mostly sophomores, who were 
enrolled in the first semester of a two-semester principles-of-economics 
sequence. The control group was made up of students enrolled in an introductory 
human geography course. The only function of the control group was to permit a 
test of the proposition that changes in attitudes of economics students occur 
because of the study of economics rather than external events. Students in the 
control group were dropped from the study if they had taken the economics 
course, since it might have influenced their preassessment attitudes. 

During the semester students in the experimental group covered a range of 
subject matter generally associated with the macro segment of the principles 
course. The semester's work centered upon four areas: (1) an introductory 
segment which defines economics and surveys basic concepts and institutions 
(e.g., markets and prices, comparative advantage, the role of the public sector, 
etc.); (2) macro theory and fiscal policy; (3) money, banking, and monetary 
policy; and (4) economic growth. In connection with a brief analysis of the 
redistribute function of government, students were assigned reading material 
which surveys basic empirical data on personal income distribution, discusses the 
causes of income inequality, and presents the various arguments which comprise 
the cases for and against greater income equality (specifically, see McConnell, 
pp. 108, 760-767). Although the income distribution issue was not explored at 
great length in class lectures, the actual distribution of personal income by 
quintiles was presented and it was noted that (1) certain groups would not be 
able to participate effectively in a market economy and hence would receive little 
or no earned income and (2) government has in fact attempted by a variety of 
means (direct market intervention, welfare programs^ and tax policies) to 

Kim Sosin is assistant professor of economics and Campbell /?. McConnell is 
professor of economics at the University of Nebraska-Lincoln. 
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TABLE 1 
Attitude Assessment Scores* 



Group 



Mean 



Standard 
Deviation 



Experimental (economics) group 
{n - 134) b 



Preassessment 
Postassessment 
Individual difference 



3.4104 
07388 
26716 



4 7969 

5 0604 
4 1273 



Control (geography) group 
(n - 38) c 



Preassessment 
Postassessment 
individual difference 



15526 
0 5526 
1.000 



5 1133 
4 6771 
3 8765 



•A positive (negative) score indicates a preference for income equality (inequality) 
b Although 191 students were enrolled, only 134 were available for both pre- and post- 
testing 

c Of the 52 students in geography, 38 took both the pre- and post-test 

redistribute income. Following Arthur Okun, the debate over income distribu- 
tion was summarized as an equality-efficiency trade-on problem, complete with 
his "leaky-bucket" analogy. 1 In addition, many other segments of the course 
touched upon or implied income (re)distributior. issues, e.g., unemployment and 
inflation, distribution of the tax burden, international income differentials, etc. 
In presenting the income distribution issue the instructor was cautious not to 
reveal his personal prejudices nor to communicate material in such a fashion as 
to alter student values in a particular direction. 

Th'j measurement instrument used in this study was an attitudinal survey 
which the authors prepared and administered at the beginning and end of ttie 
semester. This survey consisted of eleven statements pertaining to income 
distribution rules and policies. Each student was asked to indicate whether he or 
she "strongly agreed" (SA), "agreed" (A), was "undecided or neutral" (U), 
"disagreed" (D), or "strongly disagreed" (SD) with each statement. For all 
statements which favored (greater) income equality numerical weights of + 2, 
+ 1, 0, -I, 4 and -2 were assigned to SA, A, U, D, and SD responses 
respectively. Those weights were reversed for those statements favoring 
(greater) income /^equality. In other words, although the responses are ordinal, 
they were treated as cardinal to yield an aggicgate numerical score. Possible 
scores could range from -22, indicating an extremely strong preference for 
income inequality, to +22, reflecting the strongest possible egalitarian position. 

At the start of the semester a variety of background information was 
requested of each student, including sex, race, rural-urban character of perma- 
nent residence, marital status, age, cumulative grade average, hours of employ- 
ment, and occupational-income status of parents. To foster anonymity and 
thereby response accuracy, all surveys were identified by partial social security 
numbers rather than names. 
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Attitudes aad AttitHdiml Shifts 

Mean scores of Ihc experimental and control groups are presented in Table 
1 on a preassessment (beginning of semester) and postassessment (end of 
semester) basts. We observe that the stance of both groups was somewhat 
nonegalitarian at the start of the semester* the economics group being more so 
than the geography group. Both groups shifted to a lesfc nonegalitarian stance by 
the end of the semester. A reasonable interpretation of tH rather pronounced 
shift of the economics group might be that the material encountered in the 
course made students more open-minded and therefore less certain of their 
original stance. 

Are the changes in scores significant within each of the two groups? 
Calculation of t values indicated that the change for the economics group was 
highly significant at the .01 level (t - -7.4930). The change for the geography 
control group, however, was not significant (* - -1.5902). Is the attitudinal 
change for economic students significantly different from the change for the 
geography students? The calculated t value (t - -2.232) proved to be 
significant at the .05 level. The implication of these results is that exposure to a 
rather typical semester of macroeconomics does seem to entail a statistically 
significant attitudinal shift toward egalitarianism. 

It is of some interest to compare the responses in each group to determine 
the directions of the individual attitudinal changes. Are the differences in 
preassessment and postassessment responses among students consistent with the 
hypothesis that only random events influenced their attitudinal development 
over the semester or is there evidence of systematic influences? If events were 
random in impact, our results would be consistent with a hypothesis that 50 
percent of the students became more egalitarian and 50 percent became less, i.e., 
a population proportion of .50. In the economics group, 67,2 percent became 
more egalitarian, which is significantly different from 50 percent (t - 4.241). In 
contrast, in the geography group, 52.6 percent became more egalitarian, but that 
figure is not significantly different from 50 percent (t - .642). The conclusion is 
that the economics course had a systematic influence on the direction of student 
attitudes. 

Explaining Attitudes: Model and Findings 

A multiple regression model was formulated to explain the attitudes of 
beginning economics students toward income distribution in terms of student 
background characteristics. The model, which employs the preassessment scores 
of the economics group (n - 191), 2 is of the following form: 

ATT - a 0 + a,5 + a 2 R + a^P + a A M + a s G + a 6 H + M + *%0 + a 9 I + u 

where: 

ATT - student attitudes on income distribution (attitude survey score) 
S - sex (dummy variable: male - 0; female « 1) 
R - race (dummy variable: white - 0; all others - 1) 
P - population of permanent residence (dummy variable: rural, defined as 

living on a farm or in a town under 2,000 - 0; urban, defined as living in 

a town or city of 2,000 or more - 1) 
M - marital status (dummy variable: single - 0; married - 1 ) 
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TABLE 2 

Hypotheses, Rationales, and Expected Signs 
for independent Variables 



Variable and Hypothesis 



Rationale 



Expected 
Sign 



Females are more 
egalitarian than males 



R "Others" will be more 
egalitarian than whites 



Students with rural 
backgrounds are more 
egalitarian than those with 
urban backgrounds 



M Married students will be 
more egalitarian than 
single students 



The higher a student's 
grade average, the less 
egalitarian he/she will be 



H The more hours a student 
works per week, the less 
egalitarian he/she will be. 

A Older students will be less 
egalitarian than younger 
students. 

O Students whose parents 
are business operators or 
professionals will be less 
egalitarian. 

/ Students from high- 
income families will be 
less egalitarian than 
students from low-income 
families 



Anticipation of a work-life cycle 
involving periods of dependence upon 
intrahousehold transfers will cause 
females to be more favorably disposed 
toward income redistribution 

As compared to whites, there is a 
greater probability that "others" 
(nonwhites) come from low-income 
backgrounds and have been 
beneficiaries of income transfers 

Historically, the rural (agricultural) 
sector has benefited from income 
transfers. Therefore, students from 
rural areas will be more favorably 
disposed toward redistribute 
programs 

Intrahousehold transfers of income are 
a basic characteristic of households, 
e.g., the wife may be working to put 
her husband through college, hence 
there is an immediate awareness of the 
potential benefits of income transfers 

Students who perform well 
academically will have reason to 
believe they will also perform well in the 
economy and therefore will not be 
beneficiaries of egalitarian policies 

A self-supporting student will be moie 
aware of the work-income relationship 



Age is widely believed to be conducive 
to more conservative stances on 
political, economic, and social issues 

Students coming from a business or 
professional home environment will 
anticipate greater personal benefit 
from participation in the market 
economy. 

Students who have benefited from their 
parents' relatively successful 
participation in the market economy will 
be reluctant to alter that system's 
income distribution. 
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TABLE 3 




Attitude Survey Regression 




Indeoendent 


Coefficient 


t Ratios 


Variable 


Estimates 


5 


1.7470 


2.4515* 


R 


0.8404 


1 0538 


p 


-0 5657 


- 0 8099 


M 


2 6271 


1 5395 


G 


-2 4819 


-2 2481* 


G h 


-2.6269 


-2 1453* 


H 


0.0310 


1 0846 


A 


-0 1889 


- 0 8563 


r\ 
°p 


1 7721 


1 9988* 


o w 


-0.0245 


-0 0284 


L 


- 1 3606 


- 1 3782 


In 


-2.1585 


— £. UO / 0 


Intercept 


2.9408 


0 6499 


Standard error of estimate 


- 4.637 




F? - .1233; /^adjusted - 


.0642; F- 2 086 





♦Significant at the 05 level 



G - cumulative grade average [two dummy variables: (a) G m - 1 for 
medium grades in the 2.50-3.49 range on a 4.00 scale, G m - 0 other- 
wise; (b) G h - 1 for grades above 3.49, G h - 0 otherwise; grades below 
2.50 coded zero] 

H - number of hours of employment per week 

A - age of student in years 

O - occupational status of parents [two dummy variables: (1) O p - 1 if 
professional-technical, O p - 0 otherwise; (2) O w - 1 if "workers," 
defined as blue and white collar, farmers, etc., O w - 0 otherwise; 
business parents coded zero] 

/ - student's perception of income status of parents [two dummy variables: 
(a) I m - 1 if medium income, I m - 0 otherwise; (b) h - 1 if high 
income, I h - 0 otherwise; low income coded zero] 

u - random error term 

Table 2 states the ex ante hypotheses made for each independent variable, 
offers a brief rationale for each hypothesis, and indicates the anticipated 
coefficient sign if that hypothesis is confirmed by the data. There is some 
support in earlier research for the hypotheses associated with variables 5. G, and 
A (Gifford et al.; Scott and Rothman). However, a recent study by Riddle 
(1978) suggests hypotheses which are the reverse of those stipulated for 
variables P and M. 

The findings of the model for the experimental group are summarized in 
Table 3. We find that sex (5), cumulative grade average (G m and G /V ), 
occupational status of parents (0,), and perceived income level of parents (/*), 
all have the correct (hypothesized) signs and are statistically significant. The 
population (P), marital status (A/), and age (A) variables have the correct signs, 
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but are not statistically significant. The race variable (/?) also had the hypothe- 
sized sign, but the small number of nonwhites involved in the survey renders the 
statistical results highly suspect.* The employment variable (H) is not signifi- 
cant and has the wrong sign. 

The significant t value* in the model indicate success in achieving the goal 
of identifying some very important influences on students 1 attitudes. The low 
R 2, s suggest that student values are subject to considerable random influences. 
Thus the model would be a poor predictor of attitudes. 

Performance and Attitudinal Changes 

if the difference in attitudes between the beginning and end of the semester 
is attributable in part to the economics course, it seems plausible that students 
who devoted more effort to the course (thereby receiving a higher grade) 
experienced the largest attitude shift. Furthermore, the level of students 1 
attitudes at the semester's beginning may influence the size of the shift. An 
attempt was made to explain attitudinal changes (DIFF) by grade in economics 
(EG), as a measure of effort and understanding in the course, and preassessment 
scores (ATT). 

Because incoming attitudes and course grades are related to student 
personal characteristics, both along with DIFF must be regarded as endogeneous 
variables in an equation system with personal characteristics as predetermined 
variables. The personal characteristics in Table 3 plus students' class in school 
were used as predetermined variables in a two-stage least squares procedure to 
yield: 

DIFF - 2.0387 + .4245 ATT - .7577 EG\ R 2 - .0069 

The / ratio for the coefficient on ATT was 1.9897, significant at the .05 level; 
that on EG was - 2.6076, significant at the .0 1 level. 

A lower (larger negative) DIFF indicates a larger egalitarian attitude 
change. A disappointing R 2 indicates a large random clement in DIFF and 
probable imperfections in measurement. Also, it should be remembered that the 
two-stage least squares procedure, in contrast to ordinary least squares, does not 
maximize R 2 . 

The significant coefficient results support the hypotheses. First, students 
with the least egalitarian incoming attitude' experienced the largest changes. 
Second, those with the highest cognition, measured by grades, also show the 
largest value changes. 

Summary 

This study suggests that (I) one sen- ester of introductory economics may 
shift student attitudes on income distribution in an egalitarian direction; (2) sex, 
cumulative grade average, occupational status of parents, and perceived income 
level of parents arc statistically significant determinants of student attitudes 
regarding income distribution; and (3) students' attitudinal changes varied 
directly with their performance in the course. 

FOOTNOTES 

1. Sec Okun (chap. 4). Most of one 50-minute lecture was devoted to these aspects of 
income distribution. 

2. The 191 students represented 31 percent of the 601 students who took the macro 
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semester of the principles course in the fall of 1978. We have no reason to believe that 
these students did not constitute a random sample of the larger group. 
3. Only five of the 191 student* in the experimental group were nonwhites. Of these, 
only one was Mack and the remaining four were Asian-Oriental students. Also, only 
eight of 191 students were married. The students* ages ranged from 17 to 28, with the 
average age 19.7 years. 

Our model initially contained a dummy variable distinguishing between busi- 
ness and nonbusiness students. This variable was dropped, however, because it is 
debatable whether curriculum choice should be treated as an independent or a 
dependent variable. Does a student develop an anti-egalitarian view as a consequence 
of enrolling in a business college or does his anti-egalitarian position prompt him to 
select that particular curriculum? 
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APPENDIX: ATTTTUDINAL SURVEY 

1. A nation's income should be distributed 
to its citizens in proportion to their con- 
tributions to the production of that out- 
put. 

2. If a dollar is taken from a rich person 
and given to a poor person, the overall 
well-being of society will be increased. 

3. The redistribution of income is a legiti- 
mate role for government in the context 
of the U.S. economy.* 

4. Poverty is a social disgrace in a wealthy 
society and should be eradicated 
through appropriate public policies. 

5. The nation's annual income should be 
shared equally by all families compris- 
ing the economy, 

6. Workers who make a current contribu- 
tion to the national output should not be 
obligated to share the fruits of their pro- 



ductive effort with those who are not 
working. 

7. The United States tax system should be 
reformed so that high-income families 
pay a smaller share, and low-income 
families pay a larger share, of total taxes 
than is now the case. 

8. A negative income tax— wherein 
government subsidizes families whose 
incomes fall below a specified 
level -should be legislated by the 
Federal government. 

9. A nation's income should be distributed 
to its citizens in accordance with their 
needs. 

10. Families in the United States should not 
be guaranteed a minimum annual in- 
come. 

11. Income should be distributed according 
to economic factors, not by political 
decisions. 
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An Analysis of the Marginal Products of the 
One- and Two-Semester Introductory 
Economics Courses 

Ralph D. Elliott, M. Edwin Ireland, and 
Teresa S. Cannon 



Background Information and Introduction 

Many colleges and universities offer two types of introductory courses in economics. 
These generally consist of the traditional two- semester sequence and a one- semester course. 
The two-semester course supposedly provides the student with the necessary technical tools for 
more advanced work in the field while the one-semester course usually highlights major 
economic issues and introduces economic concepts by developing alternative solutions to those 
real- world problems. 

The purpose of this study is to determine whether students learn significantly more basic 
economic concepts in the traditional two-semester sequence than in the increasingly popular 
one-semester course. If not, the efficiency of our production process (teaching economics) can 
be improved and significant savings of student and faculty time and material costs can be 
achieved by substituting the one-semester for the two-semester course. 

The results of past studies conflict. For example, KIos and Trenton [4] found that a good 
one-semester course can be at least as effective in improving student learning as the 
two-semester course. In contrast, Paden and Moyer [8] found that a two-semester course 
resulted in about a lOpercent improvement in test scores compared to a one-semester course. In 
ad 'ition, Dawson and Bernstein [2] in a study of introductory economics courses in cc .eges 
and universities in New York State, found that students who had completed a two-semester 
course scored significantly better on the Test of Economic Understanding (TEU) than students 
who had completed a one-semester course. 

The present research effort is different from the previously mentioned studies in two ways. 
First, and perhaps most important, one-semester courses are significantly different now than 
they were in the late 1960s and early 1970s, when the other studies were carried out. Instead of 
being fast-paced, abbreviated versions of the two-semester sequence, many one-semester 
courses, including those at Clemson, are topic- or issue-oriented and no longer emphasize the 
mechanical and computational skills which have characterized the two-semester sequence. 
Furthermore, the one-semester course now has many books written exclusively for it. 
Instructors of such courses are no longer bound to select chapters from the traditional 
encyclopedia-type economics textbooks. 

Another difference between the present study and previously published studies is our use 
of the Test of Understanding in College Economics (TUCE) as the means of evaluating student 
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performance, instead of the older Test of Economic Understanding (TEU) . It is true that use of 
TUCE introduces many problems, the most important being that it was written specifically for 
the two-semester sequence. This no doubt biases downward the scores for the one-semester 
course. Other problems with TUCE arose from the nature of many of the questions themselves . 
Some were so ambiguous that many of our faculty could not agree on the correct answer. In 
some cases, the questions related to subjects that were not covered in the one-semester course. 
For example, international trade, which is weighted rather heavily in the macroeconomics part 
of the test, was given only a few lecture hours in our one-semester course. 

In the study reported here, answers to the following specific questions are provided: (1) 
Are there observable differences in the level of basic economic knowledge gained in the 
two-semester course versus the one-semester course? (2) If so, is there a difference in (a) simple 
recognition of basic economic concepts, (b) simple application of economic concepts, (c) 
complex application of economic concepts? 



Experimental Design 

A total of 1,057 students were included in the experiment, with 169 selected from the 
one-semester course (Economics 200) and 888 from the two-semester courses. Of the latter, 
677 were in the macroeconomics course (Economics 201). There were a total of 27 class 
sections involved in this study, 5 from Economics 200, 17 from Economics 201 , and 5 from 
Economics 202. Leftwich and Sharp [5] and Rogers [9] were the two texts used in the 
one-semester course. Mansfield [6] and Miller [7] were the two texts used in the two-semester 
sequence. 1 

At the beginning of the semester, students in these courses were given the appropriate 
TUCE pretests in order to provide data on this initial knowledge of basic economic concepts. 
Midway through the semester the students' in the one-semester course were given the 
microeconomics posttest followtd by the macro pretest, since the remaining part of that course 
is allocated to macroeconomic concepts * At the end of the semester, all students were given the 
appropriate posttests. As a result of this testing, 311 complete scores on microeconomics exams 
and 470 complete scores on macro exams were collected. 3 

In order to delineate the factors influencing test scores, linear multiple regression analysis 
was used to match individual test scores with individual student characteristics, thus avoiding 
problems of aggregate analysis. The specified four forms of the dependent variable were (1) 
posttest scores; (2) absolute improvement (ABS IMP = posttest less pretest scores); (3) 
percent improvement [percent IMP = (posttest less pretest)/pretest]; and gap closed = (posttest 
less pretest)/(33 less pretest). An analysis by Gery [3] found the gap closed variable to bt uie 
most reliable iii that it eliminates the problems of convergence and heteroscedasticity that 
characterize the other three. In addition, it controls for the ceiling effect. 4 Since we tried all four 
specifications, all results are presented for comparative purposes. 

The explanatory or independent variables used in the analysis fall into three general 
categories: (1) human capital variables, (2) environmental variables, and (3) effort variables. 
Human capital variables were academic class, marital status, sex, college major, SAT scores, 
and pretest scores. Environmental variables were a course variable, high school size, class 
time, and class size. One effort variable was used— grade point ratio. 

Since our primary goJ was to test the impact on TUCE scores of the two-semester course 
versus the one-semester course, we have not hypothesized any particular relationship between 
the various independent and dependent variables . Instead we have concentrated on including all 
the various determinants of students* learning found in the literature on the model and have 
allowed the data to determine the signs on these variables. 
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Percent Imp 
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Intercept 


3.6953 


3.6953 


0.1120 


0.0154 


Sec (1 sem) 


(21909)** 


(2.1909)** 


(2.1909)** 


(0.1946) 


-1.6473 


-1.6473 


-0.0500 


-0.0868 




(-5.559)*** 


(-5.559)*** 


(-5.559)*** 


(-6.225)*** 


Pretest Score 


0.1721 


-0.828 


-0.025 


-0.027 
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0.0832 


0.0025 


0.0038 
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0.0129 


0.0129 
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(2.0961)** 


(2.0961)** 
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High school size 


-0.001 


-0.001 


-0.00004 


-0.00006 




(-1.374) 


(-1.374) 


(-1.374) 


(-1.453) 
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0.5007 


0.0152 
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(1.5007) 


(1.0753) 


(1.0753) 


(0.9985) 
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0.2187 
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0.0066 
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(0.2440) 


(0.2440) 


(0.2440) 


(0.1999) 


Education 


-0.032 


-0.032 


-0.001 


-0.002 




(-0.029) 


(-0.029) 


(-0.029) 


(-0.031) 


Engineering 


-0.102 


-0.102 


-0.003 


-0.009 
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(-0.200) 


(-0.200) 


(-0.369) 
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-0.589 


-0.589 


-0.018 


-0.023 
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(-1.586) 


(-1.586) 


(-1.586) 
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-0.636 


-0.019 


-0.036 




(-1.165) 
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-0.006 


-0.007 




<-a5i5 


(-0.515) 
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Sex (male) 
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Marital status (single) 


-0.524 


-0.524 


-0.016 


-0.024 




(-a626) 


(-0.626) 


(-0.626) 


(-0.608) 




.2688 


.3123 


.3797 


.3129 



NOTE: Statistical significance: • = .10 level;** » .05 level; ••• = .01 level. Figures in parentheses are 

t statistics. 

■Since time and size were highly correlated, time w& eliminated from the regression using stepwise 
procedures that maximized/? 1 . 

ft 



Empirical Results 

Overall Ordinary Least Squares Regressions 

The tesults of the regressions are summarized in Tables 1 and 2. In all cases R* is 
significant at the .01 level. In terms of significance, all models are consistent for both sets of 
data (macro and micro). The section variable, which tests for a difference between the two 
courses, shows significance at the .01 level for alFequations.* The coefficient is negative, 
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(-0.740) 


(-0.740) 


(-0.187) 


Senior 
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O.C097 


-0.0267 




(tt3337) 


(0 3337) 


(0.3337) 


(-0.354) 


Sex (male) 


0.3342 


0.3342 


0.0101 


0.0347 


(1.4320) 


(1.4320) 


(1.4320) 


(1.8943)* 


Marital status (single) 


-0.368 


-0.368 


-0.011 


-0.17 


(-0398) 


(-0.398) 


(-0.398) 


(-0.233) 


R* 


.3162 


.5342 


.5342 


.5757 



NOTE: Statistical significance: * = .10 level; ** * .05 level; *** = .01 level. Figures in parentheses are 
/statistics. 



signifying that the two-semester students performed better on the TUCE test than did the 
one-semester students. In the case of the macro regressions, the gap closed coefficient is .09, 
while in the case of die micro regressions* it is .65. Although both coefficients are statistically 
significant, it is questionable whether the macro coefficient has any practical significance. In 
other words, the marginal benefits of an Additional semester of macroeconomics are relatively 
small compared to die gain in microexmomics. Of course, both gains have to be compared to 
the marginal cost of an additional seafcster of work.* 
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NOTE Statistical significance: * = .10 level; ** = .05 level; *** = .01 level. Figures in parentheses are 
t statistics. 



Ordinary Least Squares Regressions by Category of Question 

The category regressions are designed to demonstrate what influence the nvo-semester- 
versus-ooe-scmester dummy variable as well as the other independent variables have on die 
posttest TUCE score when it is divided into die categories of difficulty i.e., recognition and 
imderstanding (RU), simple application (SA) t and complex application (CA). 7 The macro and 
micro results are shown in Tables 3 and 4. On all three question types, the macro results 
indicate that students taking a full semester of macroeconomics outperform the students taking 
about half a semester of the topic, ine micro results indicate a significant improvement in 
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NOTE: Statistical significance: * = . 10 level; ** = .05 level; *•* = .01 level . Figures in parentheses are 
/ statistics. 



performance on questions involving RU and SA questions. However, the coefficient on the 
section dummy for complex micro applications is not statistically significant. The result implies 
that one-semester students have the same grasp of complex micro concepts as do two-semester 
students. 

The coefficients of determiration lire yerylowfortheseregressions. This would imply that 
limited weight should be given to tin results except where they are consistent with previous 
findings. 



Cooduskm 

With the exception of the micro equation for complex applications, all the section 
variables in the study were statistically significant. This means that students taking two 
semesters of micro- or macroeconomics may demonstrate a significant statistical improvement 
on their TUCE performance. However, students have to incur a 100 percent increase in 
cost— they must take a second semester of economics — in order to achieve the gains in 
performance found in the study. 

Until the marginal cost and marginal benefits are quantified, only a value judgment can be 
made as to whether these gains are worth their added cost. However, these results imply that in 
some cases a reallocation of resources from the two-semester sequence to the one-semester 
course might prove beneficial. That is, if only the one-semester course were offered, then the 
concentration of teaching resources on it might improve the quality of that course sufficiently to 
offset its minor deficiency. 



FOOTNOTES 

, The one-semester course is issue oriented: First, a problem is outlined as it is perceived by the public. The 
economic aspects of the problem are then discussed, after which a few basic economic tools are developed 
and applied to it. Micro concepts tend to receive more emphasis than macro concepts since over half of the 
issues discussed are micro oriented. In addition, the traditional income-expenditure model is not used to 
present macro concepts. Instead, an aggregate supply-demand framework is developed. The two-semester 
sequence is quite traditional and involves a discussion of basic concepts with some emphasis on 
application. In all classes, i.e., 200, 201, and 202, the traditional lecture method was used, with a single 
faculty member teaching the class throughout the term. 

'Given that the ooe-semester-course students take microeconomics in the first part of the course, we would 
expect that they would perform better on the macro pretest than students who had never taken economics. 
(Six questions on the macro-TUCE are on microeconomics.) Hence, we might expect a smaller 
improvement on the macro test for those one-semester students than for two-semester students. 

3 As indicated above, 380 students were given the micro pretest and 846 were given the macro pretest. But 
only 311 micro and 470 macro complete observation sets were obtained for final analysis. The loss of 
observations occurred partly because some students were absent from the posttest or had dropped the 
course. A third reason was lack of complete background data on the student; in that case, the observation 
was deleted. A review of the backgrounds of the students who dropped the courses indicated there was no 
apparent difference in type between these students and those who did not drop the courses. 

4 A ceiling effect may exist x the other thxtz dependent variables in that there are a finite number of 
questions on the TUCE, with he result that the higher the pretest score, the less room thne is for possible 
improvement on the posttest. The gap closed dependent variable represents actual improvement divided 
by the potential improvement, thus indicating in percentage terms the extent to which the student closed 
the gap between the pretest score and the perfect score. See [1] for additional discussion on this issue. 

*As indicated by the tables, various coefficients are significant in addition to the section variable. A 
discussion of the implications of the significance of these other independent variables is not included here, 
but is available from the authors upon request. 

•Additional two-stage least squares regressions were completed on the data. The results were consistent 
with the one-stage models. The explanatory power of the models was not increased, and the section 
variable retained its level of significance. Copies of the two-stage results are available from the authors 
upon request. 

Hdeally the gap closed dependent variable should have been used to test categories. However, this was not 
possible since there were an unequal number of questions in the various categories of difficulty between 
the pre- and posttest. This situation could have been avoided if the same test had been used as a pre- and 
posttest. 
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Student-to-Student Tutoring in Economicd 

Allen C. Kelley and Caroline Swartz 

While student-to-student tutoring has been used with some success in several disciplines, 
there have been relatively few evaluations of its impact in economics. This note reports on the 
impact of using student-to-student tutoring in freshman economics at Duke University. We 
conclude that this rather simple to implement teaching technique conveys ncable benefits to 
those being tutored. Indeed, in the case being reported, student-to-student tutoring conveys 
positive impacts on student achievement, in economics which are more than twice as large as 
either (a) the impact of having had high school economics, or (b) the impact of having had high 
school differential calculus. 

During the fall semester of 1975 the TIPS (Teaching Information Processing System) was 
used as a teaching tool in freshman economics at Duke University. TIPS is a computer- 
managed instnictional system which facilitates collecting periodic (usually weekly) informa- 
tion on student achievement via multiple-choice surveys, which are not usually employed to 
determine course grades. This information is used to prescribe for each student an indi- 
vidualized course of instruction through a series of student reports. TIPS has been reported on 
elsewhere and, while it was integral to the course under study, is relatively unimportant co the 
main results analyzed below. 1 Suffice to say ihat through using the TIPS system, weekly 
"surveys" (multiple-choice quizzes) were administered, and the results of these surveys were 
used to identify those students who were performing very well in the course and those who were 
having some difficulty. 

Those performing well on the TIPS surveys were provided the option of either taking the 
forthcoming one-hour exarnination or of being exempted from the examination and instead 
tutoring students who were having difficulty with the course material. Two alternative tutoring 
formats were used. For the first two portions of the course (approximately 12 of the 15 weeks) 
the tutors were required (a) to attend a one-hour training session, (b) to hold a one- to two-hour 
group tutorial, usually consisting cf two tutors and three to six tutees, and (c) to be available on ^ 
a one-to-one basis as requested by the tufecsJnftethM 
if" of tutors were essentially tlrc same, except that the group session was replaced by one hour ? 

gjgk where the tutor was available totutees in a designated place to answer questions in a one-to-one 

situation. In tfiiscasetheendreclass was invited to use the tutorial service. Virtually all of those ^ 
invited as tutors aeceptcdjhis option,* . , - . -J 

Those having difficulty diseconomies were invited, but not required, to attend the first 
two scheduled tutorial sessions. Around 18 percent of the class was provided this option over 1 

AlUnC. Kelley is Professor and Chairman cf Economics, and Caroline Swartz is a graduate student, ^ 
Duke University. They are grateful for the detailed comments by Professor Rendigs Fels on an earlier 5 
version of this note, and to Ms. April Bnacll, who assisted with the statistical analysis. ; >; 

a SOURCE: Journal of Economic. Education, vol 0, no, l,\Fall 1976, pp# 52-55. Reprinted with permission of the vtS 



the course of the semester. Of the class, 1 1 percent elected to participate in the tutorials. We 
will compare the performance of those who attended the tutorials with those who were invited 
but did not attend. 

Those who did not attend the tutorials were self-selected. Thus, we do not have a 
controlled experiment whereby we can confidently abstract from student attributes which may 
be excluded from our study , but which may have influenced student achievement in economics, 
as well as the impact of tutorials. However, for two reasons, we feel that the results presented 
below may be meaningful. First, onapriori grounds, the impact of self-selection works in both 
directions , with possible offsetting results . On the one hand, those who did not attend tutorials 
may have been disorganized, busy, or less motivated, or possess some attribute which may 
result in an understatement of the impact of tutorials, On the other hand, those who did not 
attend may have felt correctly that they knew the materials sufficiently well, and would benefit 
relatively little from the tutorials. Their absence would result in an overstatement of the 
measured impact of the tutorials. 3 Second, a comparison of the background attributes of those 
who attended the tutorials with those who did not (see Table 1) indicates that the two groups are 
very similar. There are no statistically significant differences in individual background attri- 
butes . This result, together with the findings below that a large portion of the class performance 
is explained by these background attributes, leads us cautiously to accept the hypothesis that the 
two groups are basically similar, and that the results presented below reflect, without notable 
bias, the impact of the tutorial program. 

An alternative, more skeptical (and possibly more realistic) view of the experimental 
design would hold that there is self-selection bias since student motivation, an important 
element in explaining student achievement, is not adequately measured by the attributes 
considered in Table 1. As a result, the findings presented below which show students in the 
experimental group performing higher than those in the "control" group should be reinter- 
preted to represent the combined impact of tutorials and the somewhat higher motivation by 
those students taking advantage of tutorials. Since it is the motivated students who are likely to 
benefit most from tutoring, or any teaching technique under evaluation, this experiment might 
reflect a relatively efficient format for offering the tutoring option . The results may therefore be 
interpreted in this context. 

The course required students to take three examinations (two midterms and a final), 
totaling 120 points. Students were also required to hand in five cases; the scores on the bestfour 
(80 points maximum) were included in the total points for the course. The mean and standard 
deviation of the class total points for the course were 177.3 and 13.0, respectively. 

Equation (1) presents an explanation of the student's total score (T) based on background 
attributes (listed L*s Table 1), and on whether or not the student attended one or more of the 

(1) T = 117.7 + .04 SATVERB + .03 SATMATH + 5.7 HSGPA + 3.98 MATH1 
(3.76) (2.33) (3.43) (2.37) 

+ 8.66 MATH2 + 3.23 ECONB4 - 13. 10 D, - 8.32 Th " 9.39 P3 - 8.24 D 4 
(2.85) (2.09) (-2.95) (-2.19) (-2.79) (-3.66) 

+ 2.51 Efe 

(.97) R 2 = .52 

tutorial sessions. The background attributes are all statistically significant with the expected 
sign. Having had economics or differential calculus in high school contributes 3.23 or 3.98 
points, respectively; having had integral calculus contributes an additional 4.68 points. The 
impact of the tutorial program, tualing 8 .44 points, is measured as the sum of difference in the 
estimated parameters for D x and Dfe, Q3 and D 4 , and D$. 4 Di and D3 equal unity for those 
students who were invited and did not attend the first and second tutorials, respectively; D2 and 
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Table 1 

Background Attributes of the Class, the Tutored Group, and the Nontutored Group 



Mean Value of Attribute, Standard Deviation is Parentheses 







Tutored* 


Nonhitof**!* 


Difference 




Class 


Group 


Group 


Between 


Background Attribute 


(1) 


(2) 


(3) 


(2) and (3)?*» 


SAT verbal score 


590.3 


545.0 


535.4 


No 


(SATVERB) 


(75.8) 


(93.3) 


(85.3) 




SAT quantitative score 


644.5 


593.2 


603.1 


No 


(SATMATH) 


(76.3) 


(74.1) 


(95.3) 




High school grade point 


3.41 


3.17 


3.16 


No 


average (HSGPA) 


(.45) 


(.47) 


(.47) 




Highest math attainment; 


.30 


.14 


.23 


No 


differential calculus 


(.46) 


(.36) 


(.44) 




(MATH 1) 










Highest math attainment: 


.06 


.04 


0 


No 


integral calculus 


(.24) 


(,19^ 


(0) 




(MATH 2) 










High school economics 


.30 


.39 


.39 


No 


(ECONB4) 


(.46) 


(.50) 


(.51) 





♦The "Nontutored Group" refers to those who were invited to attend either of the first two tutorial ses- 
sions but who did not attend any of the three sessions. Invitations to attend only the first two are used 
because these invitations were selective of students having difficulties based on TIPS survey results, 
whereas the entire class was invited to the third session. 

**Is the difference in the indicated attribute statistically significant at the 95 percent level? 



D 4 equal unity for those students who were invited and attended the first and second tutorials, 
respectively. D 5 equals unity for those students who attended the third tutorial. (The entire 
class, excepting the appointed tutors, was invited to participate as tutees.) 

The estimated impact of the tutc ial program is large. It is equal to 4.2 percent of the 
possible total points for the course, and .o two-thirds of the standard deviation around the actual 
mean total point score for the class. It also has a greater impact than having had both high school 
economics and high school differential calculus. Moreover, its impact is larger, as measured by 
the percentage increment to the student's course score, than the impact of alternative modifica- 
tions in the technology of teaching as reported in many studies in this Journal. Finally, in terms 
of the student's grade in the course, the tutorial program in many cases would have raised the 
student's formal evaluation by a full letter grade. 5 

The results presented above are likely to understate the impact of the tutorials since they do 
not reveal the effect of the program on the tutors. Professors John J. Siegfried and Stephen H. 
Strand, in an evaluatiomtsing data from Professor Rendigs Fels* course at Vanderbilt Univer- 
sity, have discovered that it is the tutors who benefit the most from the program. 6 Indeed, in 
their experiments, the tutees did not benefit from this technique. 

We have little to say about the optimal format for a tutorial program. Ours is but one 
experiment with what we assert is a reasonably low cost format, and one which is likely to yield 
relatively high returns. Hopefully, over time, sufficient information will be available from 
other experiments to form a reasonable judgment on its impact and best format in the teaching 
of economics. 

1317J 



Footnotes 

! Allcn C. Kelley, "TIPS and Technical Change in Classroom Instruction," American Economic Review, 
62 (May 1972), 422-428; Allen C. Kelley, "Individualizing Instruction through the Use of Technology in 
Higher Education," Journal of Economic Education, 4 (Spring 1973), 77-89. 

*This is somewhat surprising since these students were told that the total time they would spend tutoring 
would notably exceed the amount of time that they would likely spend studying and taking the midterm 
examinations. Moreover, the grade the tutors would receive would not be a top score, but would vary 
within the "A" range, depending on the evaluation of their sessions by the tutees. 

'This analysis assumes that the disorganized, less motivated students who stayed away would have 
benefited more than the average of those who attended, and the bright students who stayed away would 
have benefited relatively less than those who attended. 

4 In an exploratory regression model we also attempted to ascertain whether the students being tutored 
benefited differently according to whether or not their high school GPA score was relatively high or low. 
Unfortunately, the relatively low variance of GPA scores (and several other background attribute scores) 
for the group being tutored constrained obtaining a reliable evaluation of the possible existence of 
interaction effects, ft is statistically different from D, at the 95 percent level; ft is not statistically 
different from IV 

'Several other "output" measures were investigated (e.g., scores on each examination, total scores on 
examinations, total score on cases), and the results are basically the same as the summary result presented 
in equation 1 . For example, a regression was run using only the scores on the first two exams as the 
dependent variable and omitting the ft variable. The results are: 
EXAMS = 34.84 + .02 SATVERB + .02 SATMATH + 3.12 HSGPA + 2.40 MATH1 
(3.06) (2.88) (3.25) (2.47) 

+ 4.64 MATH2 + 1 .87 ECONB4 - 8.40 ft - 4.54 ft - 4.01 ft - 4. 19 ft 
(2.64) (2.09) (3.26) (2.09) (2.05) (3.23) 

R* = .50 

where EXAMS = sum of first two examinations (80 points maximum). This model has the benefit of 
confining the evaluation to the case where the group invited to the tutorials was selected solely on the basis 
of TIPS survey results. The average increment to the student's potential score was 4.6 percent (3.7/80 x 
100), a figwe consistent with the results presented in the text. It should be noted that the second 
examination in the course was exceptionally difficult. 

fl John J. Siegfried and Stephen H. Strand, "An Evaluation of the Vanderbilt-JCEE Experimental PSI 
Course in Elementary Economics/ ' see this Journal, pages 9-26. The impact of the program on the tutors 
will be published in a forthcoming issue of the Southern Economic Journal. 
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Textbooks and the Teaching of 
Economic Principles 



Marian R. Meinkoth 



The economic principles course in the large university frequently exhibits a high de- 
gree of standardization. A common text or texts, a common topical outline, a com- 
mon reading list, and a uniform examination may be used. Such standardization 
tends to suppress experimentation by individual instructors and, as the number of stu- 
dents and the size of the teaching staff expands, experimentation on a course-wide basis 
becomes very difficult to engineer. Many object to such standardization as too rigid. At 
Temple University the requirement of a uniform textbook for the Principles of Eco- 
nomics course has been considered particularly irksome by many members of the teach- 
ing staff. On the other hand, permitting each instructor freedom to choose his own text 
and other reading materials has been viewed with some apprehension, particularly since 
advanced undergraduate courses in economics depend upon the principles course to pro- 
vide basic groundwork. 

This paper reports on an experiment devised to demonstrate the effect of using vari- 
ous textbooks chosen by instructors. In the experimental group were five faculty mem- 
bers teaching the first half of the principles course. While they varied in rank from 
instructor to full professor, all had had several semesters of experience teaching in 
principles. During the first semester each used Samuelson [I]. During the second 
semester, one continued to use Samuelson although on a more selective basis than 
before, two used Heilbroner [2], .one used Alchian and Allen' [3], and one used three 
paperbacks by Mundell [4];Schukz [5], and Gill (6). 

Part I, forms A and B 9 of the Test of Understanding in College Economics (TUCE) 
(7] was used at the end of the course. 1 Each participating faculty member was given a list 
of the categories covered in the TUCE 1 , For the first semester all instructors used the 



Marlon R. Meinkoth is Associate Professor of Economics at Tempie University. 

•They arc: scarcity, functioning of economic systems, basic elements in supply, and demand; 
macroeconomtc accounting; determination of GNP (income-expenditure theory); money, banking 
and monetary policy; government fiscal policies; determinants of economic growth; and policies 
of stabilization and growth. 

SOURCE: Journal of Economic Education, vol. 2, no. 2, Spring 1971, pp. 127*130. Reprinted with permission of the 
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uniform one-page outline of chapter assignments for Samuelson, covering chapters 1-19 
and 37-40. But for the second semester each was left completely free to determine the 
sequence of coverage, to allocate the time lo be devoted to each topic, and to make other 
reading assignments. None saw the final exam before it was given in either semester, so 
there was no possibility of bias arising because of familiarity with the examination.- 

Table 1 compares the mean and standard deviation at the end of each semester for 
each individual professor, and for the total experimental group, as well as the TUCE 
noims. The results are surprisingly similar lor the two semesters. 3 For Part A the mean 
for the entire group tested was 20.05 for both the first semester and second semester. 
For Part B the result was 20.74 for the first semester and 20.1 1 for the second. Means 
for individual instructors showed somewhat more variation but did not change by statis- 
tically significant amounts from the first semester, except in one instance. In the case of 
professor Bi, who was replaced by professor B 2 in the second semester, there was a 
statistically significant difference at the .05 level in the mean score on Part A of the exam, 
but not on Part B. 

The author is familiar with only one other study in which the use of different 
textbooks was compared [8, p. 220], That study, however, focused primarily on a 
comparison of the results of self-study using programmed learning texts versus a 
conventional textbook in a lecture or discussion course. The conventional textbooks 
included three "well-known" texts, designated as A, B and C, used by 60 percent of the 
students, and ten textbooks designated as "all other," used by the other 40 percent. The 
conclusion was that the textbook used appeared to have significant effects on TUCE 
scores. The mean score for all students taking the test was 17.9 out of a maximum 
possible score of 30. Students using text "A" scored .86 point higher, and those using 
"C" .75 point higher than the average of students using texts in the "all other" 
category— a significant difference— while students using text "B" scored only .42 higher, 
which is not a significant difference. However, differences between textbook "B" and 
textbooks "A" and "C were not significant. Since these results were apparently not 
adjusted for other variables, the type of school and quality of students using a particular 
textbook could well have influenced the mean score of students using the book in 

question. — • . , f 

A questionnaire distributed to the students during the final examination at the end ot 
both semesters indicated a slightly more favorable attitude toward the textbook used 
during the second semester. Those students who indicated that the textbook was "helpful 
in understanding the main ideas of the course" increased from 58 to 74 percent. Those 
who found the material in the text difficult to understand dropped from 40 to 36 percent, 
and those who gave the textbook an overall satisfactory rating increased from 57 to 63 
percent. The slightly more favorable attitude toward the texts used in the second semester 
may reflect the fact that an instructor utilizes the text of his choice more efficiently than 
one which is chosen by someone else. 

On the other hand, the change in textbook policy does not seem to have influenced 
the attitude of the student toward economics. Relative to other current courses, 
economics was rated first or second in difficulty by 56.3 percent of all students in the fall, 
and 57.9 percent in the spring semester. Experience with the course influenced 14.7 
percent in the fall semester and 14.5 percent in the spring semester to "take a voluntary 



'Graduate students were used as proctors the first time the examination was given. 

'A detailed comparison of the personal characteristics of the two experimental groups of students 

will be mailed to anyone requesting it if he will send a stamped self-addressed envelop to Prof. 

Meinkolh, Department of Economics, Temple University, Philadelphia, Pennsylvania 19122. 
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viwmmtx of 


Part A 


PartB 


Professorsf 


StttckaU 


Menu 


SD 


Mean 


SD 


First Semester Exam 












A 


54 


19.33 


4.19 


18.96 


4.32 


B, 


37 


21.35 


3.93 


20.34 


3.99 




1 A/4 

104 


20.43 


4.78 


21.13 


4.30 


D 


103 


'9.26 


4.99 


20.89 


4.70 


c 


30 


21.10 


4.77 


22.58 


4.79 


Total 


328 


20.05 


4.71 


20.74 


4.52 


National norm A 


876 


19.16 


5.39 






National norm B 


829 






19.41 


5.30 


Second Semester Exam 












A 


71 


I O. I o 


4 17 


10.U7 


q 17 


O 2 


JO 


19.42 


4.19 


19.84 


431 


c 


114 


20.12 


4.49 


20.68 


4.15 


D 


61 


19.84 


4.75 


21.24 


4.84 


E 


31 


22.35 


5.40 


21.36 


5.24 


Total 


315 


20.05 


4.64 


20.11 


4.47 


National norm A 


876 


19.16 


5.39 






National norm B 


829 






19.41 


5.30 



fProfcssor B was unable to continue with the experiment the second semester, so the first-semester 
participant is designated as B i, and the second-semester participant as B 2 . 

The textbooks used by individual professors during the second semester of the year were: Professor 
A, selected material from [I J; Professor B 2 , [3]; Professors C and D, (2J; and Professor E. [4-6]. 



elective in economics," caused 49.! percent in the fall and 48.2 percent in the spring to 
"read more economic material in newspapers and magazines," but "persuaded" only 1.3 
percent in the fall and .9 percent in the springto major in economics. 

Clearly, the relaxation in choice of text caused no great change in either student 
attitude or test results. There is also no evidence that permitting the individual instructor 
to choose his text has a detrimental effect on the course. If a uniform text is not used in 
the principles course, however, it seems desirable to agree on the broad topics to be 
covered in each semester. In addition, a list of optional topics for each semester would 
permit the instructor some individual choice, without duplication between the two 
semesters. If a department feels that some checks are necessary in order to assure a 
degree of uniformity in the coverage of basic economic concepts, it seems preferable to 
test the end result rather than try to control the way in which that end is achieved. 

This suggests a common uniform examination. Hopefully, there will be additional 
updated versions of the TUCE to be used for this purpose. It has the advantage that all 
questions have been carefully tested in advance, and, as long as individual instructors 
have not seen the test in advance, of giving no advantage to the students of any one 
instructor. But it is useful only when its coverage coincides with the basic material that a 
department wishes its students to master* The alternative is an examination prepared by 
the department, which tends to favor the students of those who play the largest role in its 
preparation. In either case, the exam "used to test the achievement of minimum basic, and 
hence hopefully common, knowledge should constitute only a portion of the student's 



course grade, which should reflect not only the results of the common examination but all 
other judgments of the individual professor as to a student's achievement. 

Conclusion 

The results of this experiment suggest: 

(1) that there may be a wide range of reading materials which can be used 
successfully for teaching the basic macro- and microeconomic theory included in the 
principles course in economics; 

(2) that, given as a guide a list of broad topics such as the categories included in the 
TUCE, there will be little or no significant difference between the achievement (as 
measured by a similar uniform examination) of students taught from one standard 
textbook and those using a variety of texts and/or other materials chosen by their 
individual instructors; 

(3) that permitting the instructor to choose his own teaching materials permits him 
to experiment more freely than if he is committed to a text chosen by someone else; 

(4) that the students likewise may receive benefits from the more creative and 
innovative approach permitted the instructor, and 

(5) that using some form of uniform examination at the end of the course can be used 
to achieve the minimum amount of uniformity that is desired. 
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Teaching Principles of Economics: The Joint 
Council Experimental Economics Course Project 

By Allen C. Kelly* 



For the «*»st several years the Joint Council 
on Economic Education has engaged in a proj- 
ect to identify and assess alternative approaches 
to the teaching of college introductory eco- 
nomics. The goals of the project, as sum- 
marized by Arthur Welsh, have been 44 ... to 
develop alternative approaches that overbur- 
dened professors in two- and four-year colleges 
might find more useful than their current offer- 
ings and to encourage others to improve and ex- 
pand upon the Joint Council's efforts" (Ren- 
digs Fels, p. I). 

Several professors and schools have partici- 
pated in this effort: Kenneth and Elsie Boulding 
of the University of Colorado, Rendigs Fels of 
Vanderbilt University, Richard H. Leftwich 
and Ansel ML Sharp of Oklahoma State Univer- 
sity, Phillip Saunders of Indiana University, 
and Barbara and Howard Tuckman of Florida 
State University. Syllabi and supporting materi- 
als have now been published as special issues of 
The Journal of Economic Education for the last 
four of these courses (Fels, Leftwich and Sharp, 
Saunders, Tuckman). 1 The co arse developed by 
the Bouidings was reported upon in preliminary 
form at the December 1973 American Eco- 
nomic Association meetings, and thus I will 
concentrate my attention in this review on the 
remaining four. 

I. Overview and Comments on the 
Individual Course Packages 

There is a common set of premises underly- 
ing the four courses which have been devel- 

* Professor and Chairman, Dept. of Economics, Duke 
University. I am grateful for the comments of W. Lee Han- 
sen, John J. Siegfried, and Sue Whitesell on an earlier draft 
of this paper. 

*To obtain a copy of the four syllabi, write Publications 
Department, Joint Council on Economic Education, 1212 
Avenue of the Americas, New York. NY 10036. 



oped. Fels summarizes these premises well. 
"Standard textbooks are typically overloaded. 
All too often instructors feel obliged to assign 
the whole book, leading to overloaded courses. 
As a result, the student gains vague familiarity 
with a wide range of economic theory and a 
mastery of none of it. In addition, the typical 
course provides no training in the skills of ap- 
plying economic principles" (Fels, p. 5). 

The course developed by Fels carries this 
theme to its most extensive level. This course is 
structured around teaching the application of 
economic theory to a wide range of realistic 
problems. Students review ''cases" which take 
the form of relatively short expositions of eco- 
nomic situations — often based on quotations 
from newspapers, or on edited newspaper ar- 
ticles. Students then work through a carefully 
constructed set of questions designed to train 
them in the process of orderly thinking about 
economic problems. To make the teaching ap- 
proach more easily adopted by others, a case- 
book and instructor's manual have been devel- 
oped, coauthored by Fels and Robert G. Uhler. 
Fels's syllabus shows in detail how cases may 
be incorporated into a "conventional" course 
(i.e., the traditional lecture format), identifies 
the key concepts which should be drawn out for 
each lesson, and even provides sets of notes to 
aid professors in leading class meetings. 

In addition to the case application emphasis, 
Fels has developed his course around the Per- 
sonalized System of Instruction (PSl) approach 
pioneered by Fred S. Keller. PSl employs vir- 
tually no lectures. The lecture period (if used 
for formal instruction) can be devoted to other 
activities: discussion, test taking, tutoring, proj- 
ects, and so forth. Students may proceed at their 
own pace. Their performance is evaluated on 
the number of course units mastered. Assess- 
ment of mastery learning over the various in- 



SOURCE: American Economic Review, vol. 67, nq. 1, February 1977, pp. 105-109. Reprinted with permission of the 
ithor and publisher. 
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stnictional units is accomplished by written or 
oral examinations. Students have considerable 
flexibility in selecting the time to take the exam* 
inations; these may also be retaken any number 
of times, until mastery achievement has been 
demonstrated. Because of the extensive amount 
of testing required in such a course , student un- 
dergraduate proctors are used. The proctors are 
also available for student consultation and tutor- 
ing. They receive ihree credits toward their eco- 
nomics major for participating as proctors. 
Budget constraints will typically preclude using 
exclusively higher-paid graduate students in 
this role. There is one proctor for approximately 
ten students. Ten proctors is around the max- 
imum a professor can supervise; thus, PSI 
classes do not usually exceed one hundred stu- 
dents. 

We are fortunate to have available, even at 
this early state, an evaluation of Fels's course 
by John J. Siegfried and Stephen H. Strand, 
neither of whom was involved in the course de- 
velopment or in its implementation (Siegfried 
and Siegfried and Strand). Their careful studies 
reveal that 

"*...(!) PSI students performed no better or no 
worse on multiple choice or essay examinations 
than students in the conventional lecture course; 
(2) there was no difference in performance in 
subsequent economics courses^. . .; (3) students 
liked the course more, thought- they learned 
more, and felt they were examined and graded 
fairer in the PSI course; (4) there was no dif- 
ference in the amount of time spent on the 
course activities between PSI and conventional 
course students; (5) the tendency to elect eco- 
nomics as a major was unrelated to the method 
of instruction . . .; and (6) the student-proctors 
learned more economic theory than they would 
have learned from an alternative upper class 
economics elective.** (Siegfried, pp. 32-33] 

The latter effect, while based on a small sam- 
ple, is quantitatively quite large. 

Fels*s contributions lie in two quite separate 
areas: the application of PSI to economics, and 
the development of the case-application ap- 
proach. In a sense he has provided two course 
packages. 

Of the four course packages under review, 
Fels*s course is not only the most innovative, 
but also the most complete and diffusible. It 
includes the course syllabus, illustrative exami- 



nations, ndes to professors to guide their dis- 
cussion sessions, cases, instructions to stu- 
dents, and so forth. 

A few relatively minor qualifications might 
be expressed relating to Fels's course. First, the 
course is highly labor intensive — especially the 
PSI format, but to a lesser extent the case ap- 
proach. To obtain proficiency in economic anal- 
ysis, students must write several cases; these 
must be graded, and students must be provided 
detailed feedback. While I am convinced that 
the case-applications approach represents a su- 
perior way of teaching the most important ele- 
ments of economic principles, I would still like 
more information on alternative formats of 
using this approach which employ less labor in- 
puts. I suspect that a totally case-oriented 
course with active student involvement, and a 
course which is also economically feasible in a 
wide range of colleges and institutions, is yet to 
be discovered. I hope that considerable experi- 
mentation with alternative formats will be stim- 
ulated by the high quality materials made avail- 
able by Fels and Uhler. Equally important, I 
hope that professors evaluate the effectiveness 
of these alternative formats, and then report 
their results. 

A second qualm relates to the difficulty of the 
materials. While Fels has provided suggestions 
for lowering the difficulty level, I doubt if these 
suggestions will be sufficient. He has set his 
standards high. More experimentation will be 
required to develop a less demanding set of 
course objectives— possibly by trimming some 
of the content, but not necessarily the level of 
required analysis. Parenthetically, some would 
seriously question the feasibility of the goal of 
teaching freshmen and sophomores how to 
engage in simple and complex application of 
economic principles. My own position is that 
this goal may not be feasible for the majority of 
this group of students. However, the value of 
the benefits obtained for those who achieve this 
level of proficiency will far exceed any 
foregone learning of "tools** usually taught in 
the principles course, since these tools, typi- 
cally taught without extensive application, are 
rapidly forgotten for lack of purpose. 

Finally, it must be kept in mind that the PSI 
approach, while promising, is costly. Fels has 
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noted that ..»structors should probably be 

awarded n.^ie than one course "credit" to pro- 
vide them the incentive and time to implement 
PSl. Moreover, critical to this approach is the 
availability of student proctors. It is extremely 
encouraging that Siegfried's research has 
shown that the benefit of proctoring to the learn- 
ing of economics notably exceeds the opportu- 
nity cost of the typical uoper division eco- 
nomics course. More hard evidence on this 
point from other studies could be decisive in 
making college administrators and faculty re- 
ceptive to providing course credit for proc- 
toring, and thus effectively opening up the fi- 
nancial and technological viability of PSl. 

The course by Lcftwich and Sharp centers 
around the "issues approach" to teaching eco- 
nomics, and also stresses application of eco- 
nomic principles. They have developed a book 
which focuses on the various issues taken up in 
their syllabus. They have also provided refer- 
ences to several other books which offer perti- 
nent cases and materials. Their issues are gener- 
ally broader than Fels's cases, and as a result, 
there are fewer of these issues, and each one oc- 
cupies around a week of »* urse. The syl- 
labus itself provides for earw issue "major dis- 
cussion points," "economi. concepts and 
principles" covered, and a 4 n capitulation." 
While these aids are useful, »h * ;ourse package 
would have been considerably more valuable 
and adaptable by others had the authors also 
provided guidelines and suggestions to the pro- 
fessor for actually organizing and leading the 
various discussion sessions, together with test 
ite.. j which evaluate the students* mastery of 
the various issues. The issues, however, are 
well chosen and should engender considerable 
student interest. Moreover, in unpublished re- 
search results, the authors have argued that the 
exclusive teaching of economics around these 
several issues does not result in any notable sac- 
rifice of the basic tools typically taught in the 
standard economics course. 

Saunders* course confronts the difficult prob- 
lem of developing and coordinating an applica- 
tion-oriented offering involving the participa- 
tion of as many as twenty different instructors. 
He has assembled with the participation of his 
colleagues a concensus on a required "core" of 



economic analysis which is common to all sec- 
tions, and which is tested by a common final ex- 
amination. Each professor is then free to de- 
velop specific application emphases, e.g., 
environmental economics, current economic is- 
sues, income distribution. Students are free to 
select the orientation that interests them most. 
r\ review of the alternatives open to students at 
Indiana University reveals a smorgasbord of 
courser, seldom found in even the combined of- 
ferings of a dozen institutions of higher educa- 
tion. 

Saunders* contri' utions lie mainly in his in- 
teresting and useful presentation of information 
on course planning, teaching techniques, and 
research formulation. He has presented valu- 
able tips, in h^hly readable form, on such 
topics as the formulation and evaluation of ex- 
aminations, the development and assessment of 
course evaluations, the preparation of course 
objectives, the construction of research designs 
for appraising teaching, the techniques of coor- 
i ' mating courses with many professors, the al- 
ternative v.ays to use and train graduate student 
instructors, and so forth. Some useful base-line 
data are also presented on course evaluation 
surveys and examinations. 2 In my judgment, 
his syllabus is most helpful on the general 
methods of course develops ;n . and especially 
on the techniques of coordi i<ttr<g ar.d influenc- 
ing many instructors to adopt a common core of 
economic analysis, yet allowing each professor 
to "do his or her own thing.*' This format 
makes faculty more amenable to undertaking 
the perceived "chore** of teaching principles of 
economics, and at the same time, increases 
their effectiveness. 3 

'For example, on a point that has received considerable 
attention (J. Econ. Ed., Fall 1973), Saunders* data col- 
lected over eight semesters from almost nine thousand stu- 
dents indicate that student performance on the common 
final examinations is positively and significantly associated 
with the evaluation rating of the instructor from whom they 
took the course. 

Subsequent to the publication o* the syllabus being re- 
viewed tore, Saunders has developed two "tudent work- 
books — one for microeconomics and one for macroecono- 
mics — that define the analytical core for each semester of 
Indiana's 1 nr. factory course in terms of specific examina- 
tion questions and a set of 15-* ' wework problems. A 
special seminar has also been devei^d 'o train graduate 
studem nstructors to become mote effective teachers of in- 
troductory economics. 
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Barbara and Howard Tuckman's course is 
possibly the most traditional of the ones of- 
fered, and as a result, will be quite easily 
adopted by many institutions. Its innovations 
are less in the course content area, and more in 
the areas of providing techniques for motivating 
students to learn economics, and providing stu- 
dents flexibility in their pace of studying eco- 
nomics and of taking examinations. Students 
are placed in a simulated environment of being 
actual consultants to the government and are 
required to complete "memoranda" (disguised 
workbook and/or case application-like exercises 
of a policy orientation) for the President and his 
key advisors. The authors also offer a few spe- 
cific and useful ideas for arousing student inter- 
est during the classroom presentation of seme 
key economic tools: the consumption function, 
the multiplier, and so forth. Their attempts at 
self-pacing of student learning met with mixed 
results. When provided freedom of time alloca- 
tion students procrastinated, creating adminis- 
trative and testing burdens at the conclusion of 
the course. The authors have thus elected to 
offer rewards and penalties (in the form of 
course points) which have the net effect of 
bringing many of the students back toward the 
traditional study mode in terms of time alloca- 
tion. Their findings accord with my own hunch 
that the self-paced mode is suitable primarily 
for the more highly motivated and disciplined 
students — and these, lamentably, are not the 
vast majority of students in American higher 
education. 

II. Overall Appraisal of the Joint Council 
Program 

With the exception of Fels's package, each of 
the renaming syllabi is somewhat incomplete 
in providing all the elements of a "turn-key" 
offering for the overburdened economics in- 
structor. However, I do not consider this a 
telling deficiency of the developmental effort. 
Something should be left for the instructor to 
do, overburdened or not. More important, I find 
that the various packages, when taken together, 
provide a surprisingly wide variety of materials 
and ideas that a large number of instructors 
should find useful in their course planning and 



implementation. For example, the instructor 
will find very helpful Saunders' and Tuckmans' 
techniques of course planning and course coor- 
dination with many teachers; the instructor may 
utilize some of Fels and Uhlefs cases, and 
some of the issues provided by Leftwich and 
Sharp; and the instructor will also hopefully 
engage »n some course evaluation of the type 
illustrated by Siegfried, Strand, and Saunders. 

Other conspicuous achievements of the Joint 
Council program are both the respectability it 
has provided the idea of teaching more limited 
course content with higher emphasis on applica- 
tion, and its many suggestions on ways for ac- 
complishing these goals. Two books containing 
useful case and issue materials have resulted 
from the project; others are now available from 
commercial publishers, in part in response to 
the project itself. While the optimal format for 
using the case approach may not yet have been 
discovered, this approach is here to stay, and 
will gradually improve over time. 

Finally, these syllabi will serve to stretch the 
thinking of the teacher trapped in the routine of 
providing students with a comprehensive cover- 
age of encyclopedic texts. Such instructors, 
with the aid of these Joint Council materials, 
can rediscover some of the excitement of teach- 
ing. For example, while the Tuckmans' Presi- 
dential Policy Memoranda do not represent a 
notable breakthrough in concept coverage, this 
technique does indeed illustrate an innovative 
way of 4 4 packaging" economics to engender 
some increased enthusiasm and interest — and 
that contribution is not to be discounted. These 
syllabi will encourage other professors to break 
out of their traditional mold, and to discover 
new techniques of teaching economics in a 
more exciting and effective manner. Certainly 
the syllabi demonstrate conclusively that this 
field is wide open, and that this activity can be 
intellectually rewarding. 

In pointing to the future of the experimental 
courses under review, and to other develop- 
mental efforts of this type, two broad issues 
should be raised. First, it would be good to es- 
tablish a modest program to track over time the 
sucoe&ses and failures of these courses. One 
might, for example, document over a period of 
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five years the experience of a group of adopters, 
asking each to respond periodically to a care- 
fully constructed series of questions. Several 
issues come to mind. How many of the original 
set of adopters continue to use the materials 
after five years? Why do they continue (or stop) 
using the materials? What role do the adopters 
play in changing the original course packages, 
and in what specific ways? 

Second, a review of these syllabi raises anew 
the critical question of how best to evaluate and 
compare various courses and teaching ap- 
proaches. While instruments such as the Test of 
Understanding in College Economics are use- 
ful, this test is not by itself sufficient to compare 
such divergent courses as those represented by 
the syllabi under review. The time is ripe in 
economic education research 1) to identify a 
fairly comprehensive list of outputs which are 
likely to be impacted by changes in course con- 
tent and teaching technique, 2) to develop reli- 
able and standardized instruments and proce- 
dures for measuring these outputs, and 3) to 
attempt to obtain some weights on the value of 
the outputs as provided by the various clientele 
of our teaching programs. Such a "Manual of 
Instructional Outputs and Their Measurement" 
would include items ranging from student en- 
rollments, the number of majors, student "en- 
joyment" of the course, faculty willingness to 
teach the course, to such items as student prob- 
lem solving skills in general, knowledge of eco- 
nomic tools, ability to apply economic tools, 
social and political values, and the like. 4 The 
ability to utilize the wide range of research 
results that have been forthcoming in the Jour- 
nal of Economic Education and elsewhere is in- 
creasingly being constrained by the heteroge- 

4 Attention to developing and systematizing such a set of 
instruments will be provided in a recently undertaken five- 
year project founded by the National Institute of Education, 
and undertaken by Richard Attiyeh, Keith Lumsden, and 
mysHf. We will be developing several alternative course 
packages for teaching economics, employing Teaching In- 
formation Processing System, programmed learning, com- 
puter games, cases, and the conventional lecture technique. 
Several dozen schools will be participating in the testing 
and evaluation of these various packages. The ability to 
identify, measure, and compare instructional outputs across 
course packages is clearly critical to the research. 



neity of the set of instructional outputs chosen 
by various researchers to evaluate their teach- 
ing, and the wide range of techniques employed 
to measure even quite similar outputs. Compar- 
isons are therefore notably hampered. We 
should begin thinking about systematizing our 
efforts in identifying, measuring and weighting 
the various outputs of education in general, and 
of economic education in particular. A careful 
evaluation of the Joint Council experimental 
courses could represent an excellent application 
of, and stimulus to, such an outcome. 
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Test Information: 

An Application of the Economics of Search 



William F. Barnes 



Economic analysis is applied it explaining increasing varieties of human behavior— for 
example, marriage, fertility and crime. Frequently the applications identify the gains and costs, 
and utilize the marginal framework to discuss resource allocations. Recently the marginal 
framework has been applied to explain an individual's behavior in acquiring additional 
information when confronted by imperfect information [5]. The approach has been widely used 
in discussing search behavior in the labor market [7,9]. This framework appears to have a 
useful application in the study of student search for, and acquisition of, information about test 
questions. In this paper the acquisition of test informaticn is analyzed by identifying the gains 
and costs associated with investment in search. The consideration of search gains and costs 
predicts explanatory variables of the acquisition of information. Implications are tested against 
empirical evidence from the performance of college students on tests in economics classes. 

Numerous psychological studies explain cheating among college students as deviant 
behavior [1, 3, 4, 6, 10]. In most studies a questionnaire sample survey of self-reported 
behavior provides the empirical base. The association of cheating to personality characteristics 
is emphasized. The relation of cheating to demographic variables is reported less frequently, 
and with little explanation. Statistical tests are typically two-way cross tabulations. In contrast, 
this paper treats the acquisition of test information, which is only one form of cheating, as 
rational, though not admirable, behavior. Data are from an experiment that permits cheating 
behavior to be observed independently rather than self-reported as in a questionnaire survey. 
Predictions are tested by a multivariate statistical technique. 

Identifying Test Information Acquirers 

Empirical evidence was obtained from written examinations of college students in six 
sections of a one-quarter (10-week) juniorsenior level course in labor economics, which was 
required with a C or better for most students enrolled in the business program, but not required 
for students recently entering the college because of a curriculum char" - The six sections were 
offered two sections per quarter for three quarters. One instructor pres. .ed the same lectures in 
all sections. Class size varied from 48 to 131 students. 

For the two sections of the same course offered at different times of the day 50 percent of 
the multiple-choice questions were included in both the exams given. The remaining questions 
were different for each of the two exams. Question sheets were collected when students turned 
in answer sheets. Each quarter all questions were new. Students who acquired test information 
about the test (rather than knowledge of the subject) were identified by their performance 
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relative to the classes* mean performance on exam questions for which information was 
potentially available, compared to their performance (relative to the mean performance) on 
questions for which information was not available. 1 This procedure was followed in the fir>t 
test of the quarter for three quarters. 

Test information was potentially available to students in the second section of the day from 
students in the first section . In the second section each student's Score for the repeated questions 
(y) was compared to the section's mean score (y), calculated by the ratio yfy. Also, each 
student's score for the new questions (x) was compared to the section's mean (x), calculated by 
the ratio x/x. The ratio (y/y)/(x/x) shows the student's performance relative to the class on the 
repeated compared to the new questions. The expected value of the ratio is 1, when the 
student's relative score on the new questions is the same as that on repeated questions. Larger 
ratios show better relative performance on the repeated than on new questions. Those students 
in the second class of the day who have ratios significantly greater than by chance are presumed 
to have acquired information about the test from students taking the test earlier. Since the test 
information acquired by some raises the class mean on repeated questions (y), the expected 
value of the ratio for a noncheater is lowered. Two approaches in identifying cheaters are 
possible. The first is to find ratios significantly greater than the mean by accepting the upward 
biased estimate fory . This would underestimate the number of cheaters. The second is to use as 
the estimate of the class mean score on repeated questions (y) the mean of the first class. The 
second procedure was selected because^ can be assumed that since both classes are samples 
from the same population, the estimate y would not be biased .JJsingy in the first class as an 
estimate fory in the second class, the expected value of (yfy)l(x/x) for noncheaters in the second 
class is 1 . To determine whether persons have ratios significantly greater than 1 (the expected 
mean value for noncheaters), an estimate of the variance is necessary. The larger ratios of 
cheating students in the second class will raise the class variance. The variance of the first class 
provides an unbiased estimate of the variance in ratios of the noncheaters, which can be used to 
compute the confidence interval to determine whether the individual ratios of the second class 
are significantly greater than l_(the mean for noncheaters). 

This ratio (with estimated y of first class) was calculated for each student in the second 
class. Students performing relatively better on the repeated than on the new questions (whose 
ratios were significantly greater than one at the .01 level using the e?t»ma'ed variance of 
noncheaters) were classified as cheaters. 2 The number and percentage of cheaters together with 
the class size are reported below: 





Second Class 


Number of 


Percentage 




Size* 


Cheaters 


of Cheaters 


First quarter 


81 


18 


22.2 


Second quarter 


49 


19 


38.8 


Third quarter 


131 


57 


43.5 


Total 


261 


94 


36.0 



♦Sizes of the first class of the day were 83. 79 and 48. 



Theory of Test Information Acquisition 

Thirty-six percent of students enrolled in the classes for whom this fype of cheating was 
possible did cheat. The extent shows that such information gathering is typical, not deviant, 
behavior. This suggests the acquisition of test information can be analyzed using the standard 
theory of search behavior. 

Time spent acquiring information to improve exam performance is presumed to increase 
the value of schooling. The maximizing strategy for a student is to allocate resources so that the 



marginal cost of acquiring test information is equal to the marginal expected return from the 
information. Time allocated to improve exam performance may be used to cheat as well as to 
study. Both activities may be viewed as investments in schooling. Ethical questions aside, time 
could be allocated to cheating and to studying, subject to the least cost condition that marginal 
physical products divided by factor prices for cheating and studying be equated. Returns to 
acquiring information generally and to cheating specifically may be almost impossible to 
estimate. Nevertheless, factors may be identified which affect the (rate of) return, the 
productivity and cost of cheating compared to studying and thereby increase or decrease 
cheating. 

Factors that would raise the return from exam performance and increase the student search 
for test information include the requirement of the course for a degree and being nearer to 
graduation. Part of the return from schooling may depend upon the degree itself, completion of 
ine program. The return from exam performance and therefore the search for test information 
may be greater in a required course since passing is necessary to obtain the college degree. As 
the student nears graduation the return from exam performance may be greater because of 
returns from the degree itself which depend on completion of the program. Students may also 
attempt to improve grades as graduation approaches, believing prospective employers will take 
recent performance as the more reliable index of future performance. 

Reliance on cheating is also affected by the relative productivities and costs of acquiring 
test information compared to studying. Students with higher grade point averages, who are 
taking the course for the first time, or who are majors should be relatively more productive in 
studying compared to cheating than are students with lower grade points , who are repeating the 
course, or who are nonmajors. Higher grade point averages indicate ability to study. The 
inadequate previous performance of course repeaters suggests low productivity in studying. 
Majors have more experience in subject matter and probably some preference to learn course 
material. Also, cheating may have a lower return for majors who may expect to use the course 
material during later courses. It is not so likely nonmajors will encounter course material in later 
courses. The cost of this form of cheating compared to studying is lower for students with more 
friends in the first section and for students whose opportunity cost of time is greater. Since 
studying is expected to require more time than acquisition of test information, working 
students' higher opportunity cost of time may encourage cheating. Friends in the earlier class 
reduce the search time required to gain access to test information. The length of time prior to the 
second section exam limits the acquisition of test information. Also a longer period of time 
should reduce the cost to acquire test information. With less time between classes, acquiring 
test information may require additional costs due to missed classes. More time also provides 
more opportunity for test information to be acquired in casual encounters. 

Empirical Results 

Data on these factors are provided by a questionnaire administered in each class, seeking 
information on explanatory variables for each student. This information provided for the 
construction of the proposed explanatory variables. 

Students likely to acquire test information are those for whom the return to test 
performance is greater— the course is required (REQ) and graduation is nearer (GRAD) — and 
for whom cheating is more attractive than studying—working students (JOB), lower grade 
point students (GPA), course repeaters (REP), persons with a longer period of time between the 
first and second section of the day (TIME and TIM4), nonmajors (M AJ), persons with friends in 
the first class (FRI), and men (SEX). 

Linear discriminant analysis may be used to determine whethe* Hypothesized variables 
distinguish between students who acquired test information and those who did not. This 
technique classifies students into two groups, those who did and who did not acquire test 
information on the basis of the hypothesized characteristics. Discriminant analysis provides a 
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Unaar Diacrimlnant Function for 
Studant Acqutottton of Taat Information 


Variable 


Coefficient 


Variable 


Coefficient 


Constant 


41.533 


MAJ 


-19.171 


JOB 


-12.531 


SEX 


-16.620 


REQ 


4.658 


TIM2 


3.982 


GRAD 


6.493 


TIM4 


7.014 


GPA 


- 5.942 


FRi 


8.551 


REP 


25.606 







function which is linear in the characteristics and also provides a critical value of the function 
used for classifying a particular individual as acquiring or not acquiring test information. The 
form is estimated by regressing a dummy variable for acquisition of test information (a value of 
i signifies information was acquired and of 0 signifies no information was acquired) on the 
characteristics, which are treated as explanatory variables. A linear discriminant function was 
obtained for acquisition of test information and is reported in Table 1 . s 

In interpreting discriminant functions, a large positive coefficient on a variable indicates 
students with higher values of this variable have a greater probability of being classified as 
acquirers of test information , ceteris paribus. A large negative coefficient implies students with 
low values of this v» iable have a greater probability of being nonacquirers of test information, 
ceteris paribus. The direction of all impacts estimated by the coefficients in the function for 
acquisition of test information are, except for JOB, consistent with a priori hypotheses. The 
coefficients on REQ, GRAD, REP, FRI, TIM2 and TIM4 are positive, while the coefficients 
for GPA, MAJ and SEX are negative. 

The coefficients can be interpreter as the impact of the respective variables on the 
probability of an individual having acquired test information. The function suggests the 
probability that an individual acquired information increases 4.66 percentage points in a 
required course and 6 .49 percentage points for being a senior. Being a course repeater increases 
the probability 25.61 percentage points. Having friends in the earlier class raises the probability 
8.55 points. In comparison to no intervening period a two-hour interval before the second 
section increases the probability of acquiring test information 3.98 percentage points; a 
four-hour period increases the probability 7.01 percentage points. The probability falls 5.94 
percentage points for each increment of one point, including the first one, in the overall grade 
point average. Other factors whi reduce the probability of acquiring test information are 
being a major, by 19. 17 points; currently having a job, by 12.53 points; and being a female, by 
16.62 points. 

Tvo tests of significance are available for discriminant analysis. For the discriminant 
function an F -test shows whether die two populations, the test information acquirers and the 

Tab* 2 



Confusion Matrix 




Classified 






Acquisition 


Percent 


Actual 




Observations 


Acquisition 


Acquired Did Not Acquire 


Correctly Classified 


Acquired 


71 23 


75.5 


Did not acquire 


54 113 


67.7 


Percent of all observations correctly classified 


70.5 
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nonacquirers, arc significantly different. An F-value of 12.19 with 9 and 250 degrees of 
freedom indicates that the two groups are significantly different for acquisition of test 
information. Also, the value from the discriminant function serves to classify individuals into 
one of the two groups. The ability of a discriminant function to distinguish correctly between 
the two groups may be shown in a confusion matrix [8]. This indicates the number of 
individuals classified correctly and incorrectly as well as where misclassifications occurred. 
The matrix reported in Table 2 shows the performance of the discriminant function, which 
classified 70.5 percent of the students correctly. This compares favorably to the 53.9 percent 
chance criterion for being correctly classified. 
Conclusions 

Ninety-four students in a sample of 261 sought and acquired sufficient information about a 
test given earlier ? n the day to improve their scores significantly for the repeated questions . The 
effort put into the search was even greater than that measured since half the questions identified 
were not repeated and since some of the questions probably could have been answered from 
knowledge of the subject. The considerable search effort occurred despite the opinion of most 
students, 943 percent, that the examinations were fair. 4 Students apparently sought test 
information not because they felt exams were unfair and not as an exception but rather as a 
normal part of preparing for an exam. This behavior raises several challenges. 

The problem of cheating is not so great with motivated students— majors, students with 
higher grade points, those voluntarily in a class. The problem is more serious in required 
courses, with more repeaters and students looking toward graduation. Unfortunately it is in the 
former case with fewer sections and smaller class sizes that essay exams are most plausible. It is 
in the latter case, where the problem is more serious, that large class size necessitates objective 
exams. If the same questions must be used, sections should be scheduled with no intervening 
period. 

This study measured only one type of cheating, acquiring test information from students 
who took a test earlier in the day . The extent of this type suggests other types of cheating may be 
more widespread than previously indicated in studies of self-reported behavior. There was little 
risk, as there is in some forms of cheating, in seeking information ahoutanearliertestintheday 
or about tests from the preceding quarters. If test information is sought as aggressively from 
those earlier tests (which may even be available in complete and written form), considerable 
amounts of test information may be acquired. This makes the repeated use of test banks 
questionable, unless the banks tie available to all. 

If this behavior were victimless, acquisition of test information would not be a problem, 
but it is sufficiently widespread to call into question the equity of grades. If normal distributions 
are used in assigning grades, noncheaters will be penalized. A similar problem would occur if 
grade* were assigned less formally on the basis of relative performance. 

Finally it is particularly interesting that fewer students with jobs, despite the higher 
opportunity cost of their time and the reduced time available for study, acquired test 
information. Apparently they are less interested in a high score and more interested in learning 
(exam scoresand working were positively correlated, r 2 = .481). The students with jobs who 
presumably are in a better position to evaluate the importance of formal schooling for the labor 
market, attach more import'-.- ^rning and less to higher scores achieved through 
acquiring test information /' the learn p aspect of schooling is more valued in combination 
with working at a job, the* tiie schoo. ig experience may be more intensely utilized. 

Footnotes 

'Efforts were made to restrict other types of cheating. Students were separated and exams were carefully 
proctored. The study is limited in that acquisition of information about the test earlier in the day is the only 
form of cheating measured. This certainly is not the most seriously regarded form of cheating. Some may 
not call Ac behavior cheating, but most would consider it unfair. 
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*The concern of this paper is to identify and investigate students who cheat (acquire test information) rather 
than to see whether sufficient information was gathered by nv-mbers of the second class to raise the class 
mean score significantly on the repeated questions or on the entire test. It is nevertheless interesting to note 
that for all three quarters, while there was no significant difference in scores between first and second 
sections on new questions, for the repeated questions the mean score for the second section was 
significantly higher (at the .01 level), The better average performance on the repeated questions indicates 
that cheating, the acquisition of test information, did occur, but indicates nothing about the identity and 
cnaract eristics of cheaters. 

^Discriminant analysis is discussed in [2] and [8]. The discriminant function sometimes is reported as the 
linear regression with coefficients multiplied by a constant. It is reported here as the linear regression so 
effects of explanatory variables :an be interpreted as probabilities. 

It is interesting to test the statistical significance of these coefficients in /-tests. All but two of the 10 
coefficients s/e significant at the 0.05 level. This procedure is questionable, however, because 
dichotomous dependent variables violate the assumption of normality of error terms. 
4 This opinion was available from anonymous, classwide teacher evaluations administered each quarter. 
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EDUCATIONAL PRODUCTION FUNCTIONS 



Solomon W. Polachek Thomas J. Kniesner Henrick J. Harwood 
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Key words: Educational Production Function, Student Perfor- 
mances Trade-offs in the Educational Process* CPES Production 
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This research examines scholastic performance within 
the context of an individual's production function. A con- 
stant partial elasticity of substitution production function 
for academic achievement is presented and estimated with non 
linear maximum likelihood methods. We find that ability and 
time devoted to various aspects of the learning process are 
the most important determinants of students' accomplishments 
Our results underscore the potential for students to compen- 
sate for relatively "poor" educational backgrounds by spend- 
ing more time on study and class attendance. 



There exists in the social sciences a rich empirical 
and theoretical literature concerning the distribution of 
personal income. No matter what the approach, whether it be 
human capital (Mincer, 1974), statistical decomposition 
(Lydall, 1968), or eclectic socioeconomic analysis (Gintes, 
1971), education surfaces as a prime quantifiable determinant 
of earnings differences within a population. This link be- 
tween earning power and education underscores the importance 
of understanding the educational process. 
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Two basic views characterize the current state of econ- 
omic analysis of education. First, education is treated as 
the output produced by a school. Second, the individual stu- 
dent is viewed as using his or her own time and effort, along 
with resources purchased from a school, to produce learning. 
Both approaches may be conceptualized by what economists call 
a production function . Economic theory places restrictions 
on the mathematical form of a production function, and we 
shall first examine briefly those restrictions. Next, we 
present estimates of educational production functions with 
data from a survey of undergraduate students at the University 
of North Carolina, Chapel Hill, corroborated by estimates from 
a national survey of college students. 

Our research goals are to illuminate the value of the 
production function concept in helping to understand the edu- 
cational process and to demonstrate that the production func- 
tion concept provides some guidance in formulating public 
policy designed to influence the quality of education. 

SCHOLASTIC PERFORMANCE : A PRODUCTION FUNCTION APPROACH 

Consider an individual college student whose particular 
intelligence, amount of study time, and utilization of other 
resources leads to some level of academic success or scholas- 
tic performance. The mapping of this student's educational 
inputs (X) into a degree of scholastic performance (y) is a 
production function 



where F(*) = the pruduction function. 

In general, there are two mathematical properties of a 
production function worth noting. First, it is single valued, 
continuous, and well defined over the input set, yielding 
nonnegative outputs. Second* it has continuous first and 
second-order partial derivatives. (See Ferguson, 1969 
chapter 4) for a useful background reference cn the technical 
properties of production functions.) Figure 1 depicts a 
typical production function in two inputs. A first partial 
derivative of F( # ) is known as a marginal product because it 
indicates the productive effect of increasing that input, all 
other input levels held constant. Positive marginal products 
will typically be observed, as negative marginal products 
indicate wasted resources. 

A key property of equation (1) is that it indicates that 
a particular level of performance may be produced in a variety 



y - F(X), 



(1) 
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of ways. A grade of "B" may result from intensive home study 
coupled with sporadic class attendance or from perfect atten- 
dance paired with little study. What this means is that a 
student may trade-off class attendance for study time, or vice 
versa, in some proportion and still obtain a given grade. 
Such a trade-off is measured by a marginal rate of substitution 
(MRS) • To see this characteristic of a production function, 
totally differentiate equation (1), set dy equal to zero, and 
solve for (dX./dX.)U - , yielding 
1 Vf Vki*'i,j 

dX. 3y/3X. , 

^ 3^73X7' 

y 'Vk*i,j 

Equation (2) gives us the increase in Xj necessary to hold 
y constant if X. is decreased by a small amount. Figure 2 
depicts two possible iso-output projections ( ispquants ) , 
which are combinations of the inputs X^ and X 2 that yield^ 
two particular levels of scholastic performance, y^ and y . 
The isoquant labeled y°y , for example, is obtained from 
Figure 1 by projecting into Xi,X2 space all (positive) 
combinations of the two inputs that lie along a locus of 
constant height (y ). The slope of an isoquant at a parti- 
cular point, say A in Figure 2, is a graphical representa- 
tion of the marginal rate of substitution of X^ for X 2 » 

A standardized measure of the substitutability property 
is the elasticity of substitution (a). This is defined as 
the percentage change in the ratio of two inputs resulting 
from a 1% change in the slope of the isoquant (marginal rate 
of substitution). In Figure 2, a is the percent difference 
in the slope of ray OA and the slope of ray OB, divided by 
the percent difference in the slopes of isoquant (y°y ) at A 
and at B. The elasticity of substitution basically reflects 
the curvature o* an isoquant. The easier it is to substitute 
one input for another, the greater the value of a is, and the 
closer to a downward sloping straight line the isoquant is. 
"Perfect" substitution is said to exist when o goes to infin- 
ity. At the other end of the spectrum is a production func- 
tion whose isoquants aire right angles (a»0) , indicating that 
inputs must be used in fixed ratios if waste is to be avoided. 
In general, production functions are classified according to 
whether a is constant or varies along an isoquant. T . our 
empirical research to follow, we make use of two different 
constant partial elasticity of substitution (CPES) production 
functions: the Cobb-Douglas and the generalized CPES (Cobb 
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& Douglas, 1928). In the Cobb-Douglas function, i is fixed at 
at 1.0, a priori, and in the CPES the data are permitted to 
indicate an elasticity of substitution that takes on the same 
(constant) value for all pairs of inputs. Whereas the func- 
tions we employ are not the most general for the purpose of 
examining the intricacies of the educational process, they do 
provide the basic insights we seek, while remaining relatively 
easy to estimate. 



THE CPES PRODUCTION FUNCTION 

Equation (3) is ». CPES production function for academic 
success. This formulation was suggested by Uzawa (1962); 
it was selected because empirical experimentation with more 
complex functions (see Ferguson, 1969, p. 110) proved un- 
fruitful. _ 

-u/p 

(3) 



Y 



i=l 



6. X. 
i i 



"P 



where y = academic success 

X^E inputs of personal attribut . , scholastic environ- 
ment, and study time 
Y>P>P>$£ = (positive) parameters 
i s l,...,n = number of inputs. 



This function has marginal products which are positive and 
diminishing; 



3X 



MP. - * 
i 1 X? 



ma 



p+i y6 i Y 



-p/u 



1, . . .,n. 



(4) 



From the equations summarized in exp-^ssion (4), the marginal 
rates of substitution of X. for X. «. 3 



MRS. . 



where a. 



6. X. — 
6. ( X. 

i 



i, j = 1, . . .n (5) 
i M- ' 



From equation (5), it can be shown that ajj is the partial 
elasticity of substitution of X^ for X j . To see this, differ- 
entiate equation (5) and solve for ^^(^g^ ) ' which is the 
u;"inition of the elasticity of substitution! If one takes 
the limit of equation (3) as p goes to zero, the Cobb-Douglas 
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production function y - 7% 1 emerges. Production theory, 
i-1 

as summarized in equations (A) and (5), serves as a guide in 
our subsequent empirical analysis of the marginal productivi- 
ties of a student's resources and the potential to compensate 
for certain background deficiencies. 



EMPIRICAL IMPLEMENTATION: THE CPES 

Overview 

This section is devoted to estimation of equation (3). 
We apply nonlinear maximum likelihood techniques to a unique 
set of data containing detailed information on student per- 
sonal attributes, time allocation, and scholastic perfor- 
mance. We utilize test scores on a standardized midterm 
examination in a large lecture course to measure scholastic 
performance. (See Bowles, 1970, for a survey of this 
iswue.) 1 Because the academic achievement of individual 
students within a given class setting is analyzed, broad 
measures of school quality are implicitly held constant. So, 
we examine mainly the roles of ability and time across 
students . 

One of the peculiarities of an empirical study of a 
micrceconomic educational production function is the paucity 
of data. No information exists on how much time students 
devote to study and class attendance. Because of such data 
limitations, we created a unique body of information by sur- 
veying studenc3 in the principles of economics course at the 
University of North Carolina at Chapel Hill. In addition, 
a national data set (Eckland, 1972, and Eckland & MacGillivray , 
1972) that contains qualitative information is used f:o corrob- 
orate the empirical results from our survey. 

At the University of North Carolina at Chapel Hill, the 
principles of economics courses are primarily taught with a 
lecture-seminar system. Students attend a large common 



*Good grades have been shown by Wise (1975) to exert 
some positive influence on lifetime earnings independent of 
their tendency to encourage more education and thereby affect 
earnings indirectly. By contributing positively to a higher 
course grade, the production of a "good" examination scor is 
consistent with greater future economic welfare for students. 
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lecture presented twice a week by a faculty member and a 
small one-hour seminar given by graduate teaching assistants. 
We surveyed 227 students taking a particular macroeconomic 
principles lecture plus one of several associated seminar 
sections during the spring semester, 1975. The period of 
the survey covers the part of the course (five weeks) 
between the beginning and first hourly examination. By 
limiting the survey to such a short period of time, we have 
minimized, it is to be hoped, measurement errors stemming 
from respondents 1 ability to recall class attendance and 
study time. 

Data concerning sex, test score, study time, lectures 
attended, college board scores, and socioeconomic background 
were gathered; summary statistics are displayed in Table I. 
It should be noted that in conducting the survey we employed 
a double blind procedure which guaranteed tha*. students 
involved could not be identified by name. One side-effect 
of this was that test reliability and validity measures 
could not be constructed. 

Statistical Methodology 

We utilize as the empirical counterpart of equation (3) 

-u7p 



-P -P 
« 1 L t + 6 2 S t + 6 3 C t 



+ e t (6) 
t = 1,...,T 



where t = index of observation and 
e = random error term. 



The dependent variable (G fc ) is the proportion of correct 
responses on a 50-xjuestion objective test. This score 
reflects the learning that occurred during the first five 
weeks of the course and is one component of the student's 
course grade. L is the number of hours the student spent in 
class (lectures plus seminars) during the portion of the 
course prior to the examination. Study time (S) represents 
the total number of hours a student studied specifically 
for the first examination . S may best be interpreted as 
"cramming" time* The last of the three independent varia- 
bles, C, is the individual's score on the quantitative 
portion of the SAT test, which is required of all UNC-CH 
students. To keep the empirical analysis as simple as 
possible at first, we are parsimonious in specifying the 
inputs in the grade production process. Later on, we control 
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TABLE I 

Summary Statistics of UNC-CH Data 



The Sample Stratified 



The Sample Pooled 



Kales 



Females 



Standard 



Mean 


Deviation 


Range 


Mean 


84.90 


8.66 


43 


85.80 


12.52 


1.S5 


12 


12.67 


5.35 


3.50 


19 


5.05 


579.50 


81.49 


430 


587.10 



Standard 
Deviation 

8.33 

2.11 

3.54 

80.06 



43 
12 
19 
410 



Mean 
83.75 
12.33 
5.74 
569.69 



Standard 
Deviation Ra 



9.00 
1.43 
3.43 
82.68 



4 

1 

50 



where G 5 numerical grade on midterm examination 

L = lectures plus seminars attended 

S 5 hours studied for the examination 

C = score on quantitative section of Scholastic Aptitude Test 
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for sex and family background differences among students. 

We assume that e is independently and normally distri- 
buted with mean zero and constant variance o 9 -. Equation 
(6) is intrinsically nonlinear, and we choose to estimate 
its parameters by nonlinear least squares, 2 This technique 
finds the values of the six parameters inequation (6) that 
minimize 

T 2 
S(0) ■ Z {G - fQL>6)}% ( 7 ) 

t-1 ~* 7 

where T = the number of observations 
£ = |V>6 1 >5 2> 6 3> p,yJ and 

X t = jc t , S t , Lj • , where 

f(») = production function 

Given our assumption concerning the behavior of e, the non- 
linear least squares estimates of £ are also the maximum 
likelihood. 

We utilize the modified Gauss-Newton method for finding 
the values of 0 that minimize S(£) (Draper & Smith, 1966, 
pp. 267-270) The estimate of the variance of the errors 

e t * 8 

S 2 * j~ SCO), where (8) 
j) ■ values of <9 that minimize equation (6). 



Equation (6) could also be estimated using a Taylor 
series approximation. See Kmenta (1967). However, the 
approximation for more than two inputs generates many col li- 
near cross-product terms that make testing ror parameter 
significance difficult. Direct nonlinear maximum likelihood 
estimation of equation (6) proved the most viable alternative. 

^liese estimates were also checked by applying the 
method of estimation known as Marquardt's compromise (Draper 
& Smith, 1966, p. 272). Little change (except for rounding 
errors) resulted. 
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Gallant (1975, p. 74) shows that in large samples, 6 has a 
six-dimensional multivariate normal distribution with mean 
6 (true value)* and variance-covarian'ce matrix o z (P , F)" y 
where F is the Tx6 matrix with elements 



af(x. ,e ) 

39j 



1 , • • • ,T 
!,••«, 6« 



(9) 



In hypothesis testing, the matrix (F'F) 
approximated by the 6x6 matrix 



-1 



must be 



-1 



(10) 



ft » ^(6)^(0)] 

A (1-a) percent confidence interval for 8. may be constructed 
as 



777. 

JJ 



(ID 



9 j 1 fc o/2V JJ » 
where t a y 2 is the a/2 critical value of a t-di&tribution with 
T-6 degrees of freedom and djj is the j th diagonal element 
of ?J. Prom equation (11) the null hypothesis that G.^B.q ma y 
be tcoted at the ct percent level of significance by 
comparing 



t. 
J 



j j° 




with t^ 2 and rejecting the null hypothesis when 





A 








t. 
J 


> 





**0n this point, and for a more detailed presentation 
of the nonlinear least squares estimation technique, see 
Draper and Smith (1966, pp. 263-304). Also see Gallant 
(1975) for a concise presentation of hypothesis testing 
within the nonlinear regression framework. 
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Results 

Equation (6) was estimated for the entire sample 
(T=227) using the maximum likelihood technique described 
above. 5 Regression (1) of Table II indicates an elasticity 
of substitution (o) between factors of about 0.8. 6 



5 0ne a priori restriction was imposed when estimating 
equation (6): 
3 

I 6. -1. 

r=i 1 

In this case, each 6£ is interpreted as a "share" of the 
ith input in the production of scholastic performance. On 
this point see Ferguson (1969, chapter 5). 

6 The parameters in Table II were calculated from a data 
matrix where lectures plus seminars (L) is scaled by a factor 
of 30 and hours studied (S) is scaled by a factor of 100. 
The differential magnification of these two independent 
variables facilitated calculation of the standard errors of 
the regression parameters* To see this, remember that the 
standard errors are the square roots of the diagonal 
elements of the matrix A 2r_^ u 1~1 and that 

3f(X t ,0) a L P( - } n ^J 

f t £ = — ^ (t *• 1,...,T) is one of the column vectors 

i i 

in the Tx6 matrix F. (See equations (8) and (9) of the text 
for definitions of 8, f( # ), and F.) Further, notice that for 
the production function given by equation (6), 

3f (.)/36. - - JJ1 ^ -(Hlfi)x.-P (i . 1,2,3). 

Since p is small, the raw values of L and S generate 

little covariation between f ~ and f P , and thus lead 

t6i t62 

to a poorly conditioned F'F matrix. When L and S are 
rescaled, this problem is eliminated. 
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TABLE II 



Nonlinear Least Squares Estimates 
CPES Production Functions 



a 

Parameter Estimates 


Computed Marginal 


b 

Products 


y h h h p p 


^L 


^s 




(l) The Sarnie Pooled . . . _. 
T - 227 6.05 0.10 0.05 0.84 0.24 1.74 
SEE - 8.48 (2.61) (1.26)(2. 62X10.90X2. 25)(2. 09) 


0.72 


1.04 


0.05 


f2) The Samnle Stratified 
Males 

t - 128 8.14 0.17 0.04 0.79 0.16 2.32 
SEE - 8.51 (1.70) (1.08X1. 09)(4.75)(0. 14X0.14) 


0.78 


0.54 


0.04 


(3) Females ^ q ^ Q ^ Q % Q ^ 2.28 
SEE - 7.38 (1.57) (0.00)(1.09)(6.19)(0.09)(0.09) 

Production Function: G - yQjL P+ 5 2 s P+6 3 C 0 p 


0.00 


1.24 


0.06 


*t-values in parentheses 

b MP t , 5ff„, = marginal products evaluated at mean values (unsealed) 
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For the complete sample, the marginal product of attending 
one extra lecture or seminar is an increase of about .7 points 
on the midterm examination. This is 25% less than the 
marginal product of an hour of study time. Finally, an extra 
100 points on the quantitative section of the SAT leads to an 
exam score higher by approximately 5 points? From this infor- 
mation we calculate marginal rates of substitution and present 
them in Table III. 

Our results indicate that less able students can indeed 
compensate for differences in ability with extra study and 
class attendance. For example, a total of about 7 additional 
hours of study (about lh hours per week for the five week 
period used here) are necessary to offset a 100 point SAT score 
disadvantage. If a student were taking four courses with 
similar learning structures, this result implies that an extra 
6 (12) hours per week of study compensate for a 100 (200) 
point SAT deficiency. Such variations in SAT scores are 
typical of state universities. The associated study time 
requirements, while substantial, are not unreasonable. By 
examining the rest of Table III, one can see the other 
possible tradeoffs in the production of scholastic performance. 

One interesting question that our data permit us to 
address is whether the possibility of compensating for back- 
ground deficiency with extra class attendance or study time 
differs between male and female students. We thus stratify 
the sample and estimate regression equation (6) separately 
for males and females. Stratification yields samples of 128 
male and 99 female students. Little difference exists 
between the sexes with respect to y and p, but large 
differences exist for y an d <$• Females have a smaller y> 
implying that on average they score less than males, cete- 
ris paribus, . The sex differences in parameters in Table III 
imply that females have a higher marginal product of study 
and a lower marginal product of class attendance. One 
possible interpretation of these parameter differences 
by sex is that the test instrument meters to 



7 The most widely used measures of ability of college 
students are the verbal and quantitative college board 
examination, though these scores may capture additional fac- 
tors. It best served our purpose here to employ the college 
board quantitative score by itself as an ability proxy, as it 
is a better independent predictor of exam score than the ver- 
bal score alone or the sum of the scores. 




[347] 



Educational Production Functions 



TABLE III 

Marginal Rates of Substitution 
Derived from the CPES Production Function 
Estimates of Table II a 



Total Sample 
N » 227 



L 
S 
C 



1.43 
0.07 



0.70 



0.05 



13.8 
19.8 



Male Sample 
N = 128 



L 
S 
C 



0.68 
0.05 



1.46 



0.08 



18.1 
12.6 



Female Sample 
N = 99 



L 

S 
C 



0.05 



b 

20.2 



a In the matrix of marginal rates of technical substitution 

Tsl 4 the element S. . represents the increase in the j 
UJ iJ .th . 

input required to hold output constant when the l input 

is decreased by one unit. 

b MP_ * 0 
it 
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some extent memorization ("study") of facts, while class 
is devoted to concepts* If females are better at 
memorization than at concept formation (a popular prejudice), 
and the test weights facts and concepts about equally, then 
it is reasonable to find that females get relatively more 
product from study time and males from class time. (We 
are indebted to an anonymous referee for pointing this out*) 
Given their higher relacive productivity in study, then, 
efficient allocation of resources within the educational 
process requires relatively more study time for females. 
The mean values of S and L in Table I are consistent with 
this* 

CORROBORATION OF EMPIRICAL RESULTS : THE COBB-DOUGLAS FUNCTION 

Because of the admitted narrowness of the UNC-CH data, 
we now examine in some detail the robustness of the empirical 
results in Table II* We utilize a national data set 
(Ec':land»1972), which is a detailed follow-up study 
(performed in 1970) of a group of high school seniors ori- 
ginally surveyed in 1955 by the Educational Testing Service 
(ETS). The follow-up is a stratified sample of 42 of 516 
originally surveyed schools and was selected to provide 
a proportionate representation of schools across the United 
States* The Eckland data contain detailed student information 
on high school courses and performance (including the number 
of math and science courses taken), an objective measure 
of ability, and self-ratings of intelligence, diligence, 
creativity* and intellectual confidence* Student background 
variables include parental education and income as well as 
number of siblings and a host of attitudinal questions* Data 
are also available on freshmen through senior performance 
(grade-point averages in college and in major field of study). 

Despite a broad scope, the Eckland data contain some 
deficiencies. Certain key variables of particular interest 
to our research take on only a limited number of values 
and thus complicate interpretations. For example, ability 
(APT) is measured as a score on a 20-question test, and 
quantitative ability is not separated from verbal ability. 
Only proxies exist for other ability components. Diligence 
(DILEG) is measured retrospectively on a 4-point scale, and 
no quantitative information exists for actual time spent 
studying in and out of class* Given this lack of precise 
measurement, only qualitative implications with respect to 
the production of scholastic performance can be drawn* So, 
rather than employ the computationally cumbersome (expensive) 
CPES production function, we estimate Cobb-Douglas production 
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functions via ordinary least squares 8 to check the qualitative 
aspects of our results in flection 4. 

Regressions with freshsian grade-point average and over- 
all college grade point average as the dependent variables 
are presented in the first half of. Table IV. These 
regressions are consistent with our earlier findings that 
<a) ability and (h)diligence or time intensity of study 
are key determinants of scholastic performance. 9 

In particular, the coefficients of personal attribute 
variables are not statistically significant, while those of 
ability and diligence measures are. Finally, to guard 
against the possibility that the results in Tables II and 
IV are similar because of the change in functional form of 
the regression equation, we estimated the Cobb-Douglas 
specification with the UNC-CH data. This regression is 
also presented in Tfrble IV. Ability and diligence factors 
again dominate background and persdnal attribute measures 
as explanatory variables. The results in Table IV 
illustrate the (qualitative) robustness of the CPES parameter 
estimates of section 4. 

A QUALIFICATION 

One qualification is necessary. Learning in one 
course represents an extremely narrow aspect of the entire 
learning process within the university. In reality, 
learning should be viewed as a larger problem in which 
resources must allocate to the production of grades in a 
number of competing courses. Viewing learning and time 



$ A logarithmic transformation of the Cobb-Douglas 

function (y*yWL. 6 i) yields an estimating equation 
i 1 

In y»lny + ES^nX^ We assume an error term that is 
independently and normally distributed with mean zero and 
constant variance. SEX, ATTRAC, AND S*AT are added as con- 
trol variables. 

^e Eckland data regressions were also run without an 
adjustment for high-school grade-point average (GPAHS). Lit- 
tle difference in parameter estimates resulted. The regres- 
sions with GPAHS are reported because such a specification 
is thought to represent a more appropriate scholastic per- 
formance production function. 
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TABLE IV 

Scholastic Performance Production Functions 
(Cobb-Douglas OLS Specification) 



Background: 
FINC 
FED 
MED 



Personal Attributes : 
SEX — " 
ATTRAC 
S*AT 



No. 



Observations 

* 2 



Independent Variables coef 

Ability: 
tBQ ' 
C8V 
APT 
CREAT 
INTEL 
INTCON 
GPAHS 

Oiliqence: 

STUOY 
LCT 
WKST 
OILEG 



0.386 



11.53 



0.207 



ECKLANO DATA 


Freshman CPA 


Overall GPA 


coef 


t-value 


coef 


t -value 


0.023 


3.70 


0.011 


2.11 


-.066 


-2.13 


0.016 


0.59 


0.028 


0.55 


0.122 


2.77 


0.085 


1.99 


0.067 


1.82 


0.341 


8.04 


0.257 


7.09 



UNC-CH DATA 



7.25 



-.202 


-1.48 


-.125 


-1.07 


-.068 


-1.30 


-.028 


-0.63 


0.070 


1.02 


-.007 


-0.12 


655 




655 




0.35 




0.30 





Examination Score 

coef t -value Mean Values 



0.230 4.44 
0.164 3.32 



0.067 
0.025 
0.034 
-.005 



227 
0.31 



2.21 
2.13 
0.58 
-0.44 



0.009 0.88 
0.018 0.67 
-.002 -0.06 



-.024 -1.67 



583.53 
539.47 



3.69 
5.11 
8.38 
2.25 



27.52 
14.61 
14.02 



0.40 
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TABLE IV (continued) 



CBQ = score on college board (SAT) quantitative examination range: 200-800 
CSV = score on college board (SAT) verbal examination range: 200-800 
APT = score on ETS aptitude test range: 0-20 

CREAT = respondent's rating of own creativity range: 4 (very creative) -1 (not at all creative) 

INTEL 5 respondent's rating of own intelligence range: 4 (very intelligent) -1 (not at all intelligent) 

1NTC0N = respondent's rating of own irtellectual confidence range: 4 (very) -1 (not at all) 

CRAMS = high school grade-point average range: 1 (F) -5 (A) 

SEH = number of seminar sessions attended 

STUOY = number of hours of study for midtern examination 

LCT = number of lectures attended 

WKST = average study hours per week 

DREG = diligence measured by whether respondent completed class assignments and recommended reading range: 4 (almost 

always) -1 (rarely or never) 
FINC = family income ($000*8) 
FED ~ father's level of education 
MED r mother's level of education 
SCX = dummy variable 0 (male) -1 (female) 

ATTRAC 5 respondent '8 rating of attractiveness to opposite sex 3 (very) -0 (not at all) 
S*AT = interaction of SEX and ATTRAC 

Dependent Variables 

Eckland Data: CPA measured on a four point scale 

UNC-CH Data: midterm examination score on 100 point scale 

Functional Form: n « 

In Y s Of* + I ai In X, ♦ £8 2. 

U i=l 1 j=l J J 9 

where In Y ia the log ft of the dependent variable, In X i is the log fi of the ith input into the production of 
scholastic performance, and Z is the jth conditioning factor. 
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allocation within the narrow context of one course may lead 
to biases in our coefficient estimates analogous to 
simultaneous equations bias. Whereas in our model each input 
is taken as independent, a more rigorous specification would 
allow for interrelation between inputs and de* tred grades 
across a student's courses. Lack of data in the UNC-CH 
survey prohibits the construction of such simultaneous 
models . 



In this research we view scholastic performance within 
the context of an individual's production function. We 
find that innate ability and acquired background are the 
most important determinants of students' accomplishments. 
Moreover, time devoted to the learning process proved 
significant. In particular, class and seminar attendance, 
as well as study time, yield positive (but differing) 
marginal products. Estimates of marginal products are poten- 
tially useful in allocational decisions within the educational 
process. For example, suppose that an administrator has X 
dollars to spend on an activity that improves students' 
academic performances. Suppose also that by spending this 
amount on activity A, students each attend one additional 
class, whereas by spending this amount on activity B, 
students each study one additional hour. Our hypothetical 
administrator will make better use of his or her resources 
by choosing the activity with the higher marginal product. 

Our findings also identify some trade-offs within the 
educational process. For example, we find that approximately 
lh axtra hours of study per week compensate for a 100-point 
college board score deficiency. This result underscores 
that it is possible for a diligent student with a poor 
educational background to be academically successful, a 
fact sometimes neglected by school administrators when 
formulating admission standards. Finally, our production 
function estimates illustrate the significance of students 1 
own endeavors in achieving academic success, a factor 
typically neglected when analyzing the social value of 
education* Part of the be . 4 fits often attributed to public 
expenditure are direct outputs of student effort. 
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